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Preface 


Pedra, paper i tisores 
un, dos, tres 

Jan Ken Pon 

A-i ko de sho! 


World RPS Society [437] 


Information fusion is a broad area that studies methods to combine data or 
information supplied by multiple sources. Aggregation operators are some of 
the functions that can be used for combining data. 

This book is intended for those interested in methods for aggregating in- 
formation and, specially, for those who need to embed such methods in ap- 
plications. It constitutes an introduction to the field. The main focus is on 
functions that deal with numerical information although other kinds of func- 
tions (specially ones for ordinal scales) are considered as well. It is aimed at 
senior undergraduate and beginning graduate students of computer science, 
engineering, and mathematics. 

This is an introductory book in the field of aggregation operators, focused 
on practical applications; we have tried, on the one hand, to limit the operators 
and results to a set of manageable size and, on the other hand, to include some 
descriptions and examples of such operators at work. 

We have also included a few computational issues. It has to be said that 
although for most operators no implementation details are given, their im- 
plementation is usually straightforward. Most of the operators and methods 
appearing in the book have been implemented by the authors (in Java). 

Due to our objective, results with a mainly mathematical interest are not 
included in the text. For example, only aggregation operators that combine 
a finite number of inputs have been studied in detail. Some definitions and 
results that can be useful for further study but are not relevant for real appli- 
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cations have been included in separate figures. This is the case for definitions 
of fuzzy integrals of continuous functions. 


Organization 


The book contains an introductory chapter, two chapters presenting some 
other introductory topics, and the main chapters. 

The Introduction describes information integration at large, and locates 
aggregation operators in this setting. 

Chapter 2 describes some of the tools that are needed later in the book. In 
particular, it focuses on measurement theory, probability and statistics, and 
fuzzy sets. 

Chapter 3 gives an introduction to functional equations. Some well-known 
equations are reviewed, and a few notes on how to solve them are given. 

Chapter 4 is devoted to the synthesis of judgements. It mainly reviews 
aggregation operators related to separability and quasi-arithmetic means, first 
without weights and then with them. At this point, the Bajraktarevi¢é’s mean 
is defined. A few operators for ordinal scales are also presented. 

Chapter 5 gives an overview of fuzzy measures. The most well-known fam- 
ilies are studied: belief and plausibility and l-decomposable and distorted 
probabilities. Such fuzzy measures are later used in conjunction with fuzzy 
integrals. 

Chapter 6 describes aggregation operators that can be expressed as par- 
ticular cases of fuzzy integrals. Such operators include weighted means, OWA 
operators and weighted minimum and maximum. Fuzzy integrals, such as 
Choquet, Sugeno, t-conorm, and twofold integrals, are also defined and com- 
pared. 

Chapter 7 is devoted to a few indices to evaluate aggregation operators 
and their parameters. This section includes descriptions for the Shapley and 
Banzhaf indices, interactions, average values and orness. 

We finish, in Chapter 8, by considering the process of parameter determi- 
nation for some particular operators, for example, for learning weights for the 
weighted mean and fuzzy measures for Choquet integrals. Two cases are con- 
sidered, parameter determination with the help of an expert and parameter 
determination from examples. 

To ease the reading, references have been grouped in bibliographical sec- 
tions (Bibliographical Notes, at the end of each chapter). The full listing of 
the references is given at the end of the book. Examples have been given to il- 
lustrate the operators, and figures and tables have been included for the same 
purpose. In some cases, figures have been added to include some definitions or 
properties that have less interest for practical application (e.g., definitions of 
some fuzzy integrals in continuous domains). The book finishes with an Ap- 
pendix where the main properties and some aggregation operators are listed. 
The lists are not exhaustive. 
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How to Use This Book 


'The book does not assume specific previous knowledge of aggregation opera- 
tors, and Chapters 2 and 3 give some preliminaries to make it self-contained. 
Although the chapters have been written to avoid dependences as much as 
possible, there are some dependences between chapters. The most important 
relationships are enumerated here. Chapter 4 uses functional equations re- 
viewed in Chapter 3, and Chapter 6 defines fuzzy integrals that use the fuzzy 
measures described in Chapter 5. Evaluation methods (Chapter 7) are based 
on the particular operators and the particular parameters explained in pre- 
vious chapters (e.g., Shapley value for a fuzzy measure). The problem of pa- 
rameter determination for a given operator (Chapter 8) naturally needs the 
operator under consideration (described in previous chapters). Nevertheless, 
to prevent the reader from going back and forth, there are minor repetitions 
in the text. 
'The following equation is the most repeated one: 


min a; < C(ai, . ..,4N) < max aj 
i a 
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Introduction 


Pregada fo Natana per totes les dones 
que digués la manera segons la qual 
per art poguessen atrobar a eleger 

la dona qui es millor a abadessa. 

Per a aquella manera dix Natana es 
atrobada veritat per la qual veritat 
porem atrobar aquella dona qui es pus 
cuvinent e mellor a esser abadessa.! 


Ramon Llull, [234] (p. 80) 


Information fusion techniques, in general, and aggregation operators (or aggre- 
gation functions), in particular, are extensively used in several fields of human 
knowledge. T'hey are used to produce the most comprehensive and specific da- 
tum about an entity from data supplied by several information sources (or the 
same source at different periods of time). They are used in systems to reduce 
some type of noise, increase accuracy, summarize information, extract infor- 
mation, make decisions, and so on. To illustrate this, we consider below some 
examples in different fields. Some of the typical applications are also included. 


Economics: Aggregation techniques are used to define indices about prices 
such as the Retail Price Index (RPI) and, in general, to summarize any 
kind of economic information. Listings of countries or companies, where 
individuals are ordered according to their ranking with respect to several 
criteria, are frequently published in journals and newspapers. Examples 
are the Human Development Index (HDI), which is an average of the life 


! Natana was asked by all the sisters to describe the method according to which, 
with the system, one can find and elect the sister who is suited best to be abbess. 
(...) “By this method,” said Natana, “is found the truth; by this truth we will be 
able to find the sister who is most suitable and best to be our abbess." Translation 
from [176]. 
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expectancy index, the educational attainment index, and the adjusted real 
gross domestic product (GDP) per capita. 

Biology: Methods to fuse sequences of DNA and RNA are used in several 
applications. Aggregation operators have also been developed to combine 
information about taxonomies (classifications of species). More specifi- 
cally, methods exist to combine dendrograms (tree-like structures) and 
partitions. 

Education: Aggregation operators are extensively used in education for as- 
sessing students’ knowledge in a given subject or to assign them an overall 
rating for several subjects. Different methods are used in different coun- 
tries, according to tradition and to the scale used when giving grades 
(both numerical and ordinal). Scores for evaluating educational institu- 
tions (e.g., universities) are another example of the use of aggregation 
operators. 

Computer Science: Aggregation operators are used for different purposes. 
On the one hand, we have artificial intelligence applications, which are 
commented on in more detail below. On the other hand, we have decision 
making procedures that are applied, for example, to evaluate and select 
hardware and software. 


Within artificial intelligence, information fusion is also widely applied, and 
its use is rapidly increasing as more complex systems are being developed. For 
example, its uses in robotics (e.g., fusion of data provided by sensors), vision 
(e.g., fusion of images), knowledge based systems (e.g., decision making in 
a multicriteria framework, integration of different kinds of knowledge, and 
verification of knowledge-based systems correctness) and data mining (e.g., 
ensemble methods) are well known. Recent advances in multiagent systems 
extend the range of information fusion applications in systems where an agent 
needs to consider the behavior of other agents to make decisions on the basis 
of distributed information. 

Although the number of information fusion applications in artificial intel- 
ligence is large, it can be said that there are only two ultimate goals. They are 
(i) to make decisions and (ii) to have a better understanding of the application 
domain. We describe them in more detail below: 


Decision making: This consists either of selecting the best alternative (al- 
ternative selection) or building one new alternative (or solution) from a 
set of them (alternative construction). 

e In alternative selection, fusion is used to evaluate the alternatives. A 
typical situation is one where there is a set of alternatives and each 
is evaluated against several criteria (this situation corresponds to the 
multicriteria decision making - MCDM - problem). For example, when 
a buying agent has received several offers and wants to select the best 
one, it needs to consider the best price, the best quality, and so on. 
This situation can be modeled in terms of several preferences (or utility 
functions) or by using a single but multivalued preference. That is, for 
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Criteria 
Satisfaction on: 


alt | Price Quality Comfort alt | Consensus alt | Ranking 








FordT| 02 0.8 03 FordT| 0.35 206 | 072 
200 |07 07 0.8 206 0.72 FordT | 0.35 
— > 





Fig. 1.1. Decision making: (a) multicriteria or multivalued preferences; (b) aggre- 
gation of degrees of satisfaction (aggregation of preferences) and construction of the 
global degree of satisfaction; (c) ranking of the alternatives according to the global 
degree of satisfaction (preferences) 


each offer, we consider the degree of satisfaction in terms of price, 
quality, and so on. Figure 1.1 illustrates the case in point. The figure 


includes several criteria c1,...,cyw for each alternative. 
'The alternative selection problem is usually solved in a two stage 
process: 


(i) For each decision alternative, aggregate the degrees of satisfaction 
of all criteria. In this way, we obtain for each alternative a single 
aggregated value that corresponds to a global degree of satisfaction. 

(ii) Rank the alternatives with respect to the global degree of satis- 
faction. 

It is clear that the cornerstone of the process is the aggregation 
method used in the first stage. Figure 1.1 illustrates the whole process. 

Systems modeling group decisions also fit in this class of alternative 
selection problems. In this case, different experts in a group have dif- 
ferent opinions and the goal is to obtain some consensus. This field of 
study is known as group decision making (GDM). 

e In alternative construction, fusion corresponds to the whole process 
of building a new alternative from the original ones. It is important 
to underline that it is often the case that the alternatives correspond 
to partial solutions and that different alternatives might be incompa- 
rable or mutually incompatible. This process has to consider the im- 
portance and the reliability of the alternatives, their constraints, and 
the approaches used when building them. Algorithms for plan merging 
and ensemble methods in machine learning can be studied from this 
perspective. 

Plan merging consists of integrating partial plans to build a more 
complex one. In the integration process the preconditions and the ef- 
fects of each partial plan have to be considered, as they define con- 
straints on the order in which the partial plans can be executed. For 
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Model ~> 























Fig. 1.2. Ensemble methods for classification: x represents the instance to be clas- 
sified, and C represents a method for aggregating partial solutions 


example, the plan for tightening a nut cannot be applied after assem- 
bling. 

Ensemble methods consist of building several models from examples 
and then combining them to define a new one. The new model is in- 
tended to be more reliable and with less error than each of the original 
ones. Figure 1.2 illustrates this case: in a classification problem, several 
classifiers or models M; are constructed from a set of examples using, 
for example, different supervised machine learning techniques. Then, 
for a particular instance (or situation) x, all models are applied each 
giving a solution (or class) M;(x). In the figure, each model leads to 
A, B, or C. Then, the solution of the whole system that is denoted by 
M (x) estimates the class of the instance x. This is computed from the 
classes M;(x) applying some consensus procedure C. This procedure 
strongly depends on how M;(x) are represented. In the problem repre- 
sented in Figure 1.2, we can use the voting procedure as the consensus 
procedure C. 


Improving the understanding of the application domain: A system 


solely working with data obtained from a single source of information faces 
several inconveniences caused by insufficient data quality. In particular, 
we underline the following difficulties: (i) lack of accuracy of the supplied 
data due to errors caused by the information source (either intentional 
or accidental) or due to errors in transmission; (ii) lack of reliability of 
the sources; (iii) too narrow information supplied in relation to the work- 
ing domain (the information only describes a part of the application's 
domain). 

To deal with these problems, information fusion techniques can be used. 
The techniques can increase the reliability of the system, improving their 
data quality and extending their domain of application. In fact, in some 
circumstances, such techniques permit the extraction of features that are 
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impossible to perceive from individual sources. Extraction of 3D represen- 
tation of objects from several images corresponds to this case. 

Note that in the setting of improving the quality of the data, information 
fusion can be applied at the time the system is built or at runtime (for 
example, by combining the newly acquired information with the previously 
established one). Knowledge revision can be seen from this perspective. 


Although information fusion is a useful tool appropriate for improving the 
capabilities of intelligent systems, it is important to underline that difficul- 
ties arise in their use because such data are frequently not comparable and 
sometimes inconsistent. Therefore, systems have to embed simple fusion tech- 
niques in larger software tools so that results are consistent. These issues are 
described in more detail in the next section. 


1.1 Fusion and Integration 


'This section defines some of the terms in the field of information fusion and 
integration. 

In Section 1.2 we present a general architecture for information integra- 
tion based on the processes commonly admitted in multisensor fusion and in- 
tegration. Information integration is considered here as a general framework 
that embeds information fusion. This follows the approach in the sensor field, 
where multisensor fusion and multisensor integration are also differentiated. 
Additionally, we shall use the term aggregation operators to refer to concrete 
mathematical functions. According to this, we describe the terms information 
integration, information fusion, and aggregation operators as follows. 


Information integration: This corresponds to the use of information from 
several sources (or from the same source but obtained at different times) 
to accomplish a particular task. 

Information fusion: Information integration requires particular techniques 
for combining the information. Information fusion is the actual process of 
combining these different data into one single datum. Therefore, informa- 
tion fusion refers to particular mathematical functions, algorithms, meth- 
ods, and procedures for data combination. According to this, information 
fusion is one of the processes embedded in an information integration 
architecture. In the following, we will use combination as a synonym of 
fusion. 

Aggregation operators: These operators (also referred to as means or 
mean operators) correspond to particular mathematical functions used for 
information fusion. Generally, we consider mathematical functions that 
combine N values in a given domain D (e.g., N real numbers) and return 
a value in the same domain (e.g., another real number). Denoting these 
functions by C (from Consensus), aggregation operators are functions of 
the form: 
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Unanimity or idempotency: C(a,...,a) = a for all a 
Monotonicity: C(a1,...,an) > C(aj,..., ax) when a; > a; 


Symmetry: For any permutation 7 on (1,..., N} it holds that 


C(ai, LS . an) = C(azqa), E S Qx(N)) 


Fig. 1.3. Main properties of aggregation operators 


C:D* =D 


Usually, operators fuse input values taking into account some informa- 
tion about the sources (data suppliers). That is, operators are parametric 
so that additional knowledge (background knowledge, following artificial 
intelligence jargon) on the sources can be considered in the fusion process. 
We express this by Cp, where P represents the parameters of C. 

As an example, we can consider the arithmetic mean as one such ag- 


gregation operator: 
N 


C(a1,...,an) = S > ai/N 
i=l 
This expression does not include any information on the data suppliers. 
Instead, the weighted mean is another aggregation operator that includes 
a weight for each data supplier: 


N 
Cp(a1,..., aN) = » i auf N 
i=1 


Here, p; is the weight/relevance for the source supplying datum a;. 

Aggregation operators are usually required to satisfy unanimity (de- 
fined in Figure 1.3) and, when D is an ordinal scale, monotonicity. The 
two properties imply that aggregation operators are functions that yield 
a value between the minimum and the maximum of the input values. 
Formally, they are operators C that satisfy internality: 


mina; € C(aj,...,aw) € maxa; (1.1) 


Moreover, in some circumstances symmetry is also required. Here, sym- 
metry stands for the fact that the order of the arguments is not relevant. 
In other words, there is no source distinguishable. 

From this point of view, it is clear that all aggregation operators are 
information fusion methods. However, only information fusion methods 
with a straightforward mathematical definition are considered here as ag- 
gregation operators. Therefore, not all information fusion methods are 


1.2 An Architecture for Information Integration T 
Acquisition Preprocessing 


> 
Data model 


Acquisition Preprocessing 


mc 


















Fusion Execution 





























Acquisition Preprocessing 


Fig. 1.4. General architecture for data fusion 





aggregation operators. In particular, methods with a complex operational 
definition (e.g., complex computer programs) are not considered in this 
book as such. Naturally, the division between both terms is rather fuzzy. 


1.2 An Architecture for Information Integration 


Now, we turn to a general architecture for information integration. This ar- 
chitecture (see Figure 1.4 for a graphical representation) distinguishes the 
following stages. 


1. Acquisition: The first stage corresponds to the process of gathering in- 
formation from the information sources. This stage is also called detection. 
In order to have good data quality, a good source model is required, that 
is, a model of the uncertainty and error of the sources, so that it is possible 
to have a measure of the quality of the information. This measure can then 
be used in the fusion process so that it takes into account the reliability of 
the sources. The requirement of a source model is needed when combining 
sensory information (sensor model to determine the reliability of a partic- 
ular sensor) or human (symbolic) knowledge (to determine, for example, 
if the supplied information is within the expert’s domain of expertise or 
belongs to some general knowledge). 

Following the analogy with multisensor fusion, acquisition can be pas- 
sive (when the information recorded is already present in the surroundings 
of the system) or active (when the information recorded is a consequence 
of an action initiated by the system). 

2. Preprocessing: This second stage consists of preparing the data for the 
fusion process (i.e., of making data computationally appropriate). Several 
procedures are encompassed in this stage and they range from simple (like 
noise reduction and sensor recalibration procedures) to complex (edge de- 
tection and filtering methods). Procedures are considered preprocessing 


8 1 Introduction 


as long as they only use the information of a single information source. 
This stage also includes procedures for making data commensurable and 
for solving the registration problem. Several pieces of data are commen- 
surable when they refer to the same position in space and instant in time. 
'The registration problem corresponds to the determination of information 
from each sensor that refers to the same features in the environment. 

Aspects of both data commensurability and registration are relevant 
not only when we are considering numerical data from sensors but also 
for other kinds of data and applications, e.g., symbolic data in knowledge 
elicitation. In this latter setting, knowledge either elicited from experts or 
(automatically) extracted from databases has to be commensurate with 
other information before being integrated. Otherwise, results will not be 
meaningful. 

3. Fusion: Once data are preprocessed and, thus, are commensurable, they 
can be fused. At this stage, aggregation operators or more complex fusion 
methods are applied to obtain a new datum. Typically, all input data 
uses the same representation formalism, which is also used to represent 
the outcome of the system. For example, the outcome of a set of images is 
another image. Nevertheless, some systems differ from the approach. For 
example, different input data use different formalisms. Then, instead of 
direct fusion, one source can be used to guide or cue the data from other 
sources. This is referred to as guiding or cueing and consists of indirect 
fusion. The case of visual information guiding the operation of a tactile 
array mounted on the end of a manipulator is an example of this situation. 

4. Execution: Appropriate procedures are applied using the datum ob- 
tained in the fusion stage. Two kinds of procedures can be distinguished: 
action application and data interpretation. Control systems correspond 
to the first case. They use the outcome of the fusion process to decide 
what action to take. Exploratory robots might correspond to the second 
case, as they will analyze the new data and add them to their knowledge 
base. This case corresponds to a world model revision because the system 
modifies the state of its own model for the operating environment. Again, 
this classification is rather fuzzy, as the analysis of the data can change 
the behavior of the robot. 


Note that all the procedures and functions that participate in this archi- 
tecture are task-specific and, thus, change according to the application. For 
example, a decision making process in a multicriteria environment requires 
first the acquisition of the values for each criteria. Next, the preprocessing 
stage consists of the normalization of the data or data translation into a 
uniform space (e.g., the [0,1] interval). Then, the fusion is applied using a 
particular aggregation operator and, finally, a decision is made selecting the 
alternative that is best rated. In contrast, fusion for obstacle detection in a 
robot navigation system requires different procedures: gathering sensor data, 
making them commensurable, fusion, and, finally, raising an alarm if an obsta- 
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cle is found. Nevertheless, although different procedures are used, the stages 
above apply to all such cases. 


1.3 Information Fusion Methods 


In the previous section we have focused on fusion processes and their role in 
an information integration architecture. We now turn to the fusion methods 
themselves. 

Information fusion methods can be studied from different perspectives. 
In the rest of this section, we describe some of the dimensions in use for 
classifying them. 'To some extent, this classification is independent of the type 
of information source used (sensor or expert) and whether all the information 
is acquired at the same instant or at different times. 


Type of information: Two main categories are distinguished. They corre- 
spond to redundant and complementary information. 

e Redundant information occurs when several information sources de- 
scribe the same features in the environment. Differences in the data, 
expected to be small, are due to the lack of the source's reliability. 
Redundant data are fused to reduce uncertainty and increase data 
accuracy. 

e Complementary information corresponds to the case of sources de- 
scribing different features of the environment (different subspaces). 
Different data describes different characteristics that are not similar. 
Fusion is applied so that the system model cover all subspaces. 

Type of data representation: A basic consideration for any aggregation 
operator or fusion method is the type of data it is going to fuse. At present, 
there exists a large number of aggregation operators applicable to a broad 
range of data representation formalisms. For example, aggregation oper- 
ators on the following formalisms have been considered in the literature: 
numerical data, ordinal scales, fuzzy sets, belief functions, dendrograms, 
DNA sequences, among others. In fact, any kind of data representation 
formalism is adequate for applying fusion techniques because the plurality 
rule (mode or voting) can be applied to data of almost any type. 

Level of abstraction: Due to the information flow within systems (low- 
level data is transformed into high-level information), fusion techniques 
can often be applied at different levels of abstraction. For example, in a 
multisensor fusion system for tank detection, the following levels can be 
distinguished: signal, pixel, feature, and symbol. Similarly, in à knowl- 
edge elicitation problem using data from multiple experts, fusion can be 
performed either at the matrix level (directly on the raw data supplied 
by the expert) or at the similarity level (using similarities extracted from 
experts' raw data). 
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Let us consider the expression 


C(a1,a2,...,aN) = argminc(Y 7 d(c, ai)}, 


ay 


where a; are numbers in R and where d is a distance defined over D. Then, the 
following hold: 


1. When d(a,b) = (a — b)?, C is the arithmetic mean. That is, C(ai1,a2,...,an) = 
x ai/ N. 

2. When d(a,b) = |a — b|, C is the median. The median of ai,a2,...,an is the 
element that occupies the central position when the elements a; are ordered. 
'The median is formally defined in Definition 6.7. 

3. When d(a, b) = 1 if and only if a = b, C is the plurality rule (mode or voting). 
That is, C(a1,a2,...,an) selects the element in R that appears most often in 
(a1, 02,..., an). 


Fig. 1.5. Aggregation as the object that is located at the minimum distance of the 
objects being aggregated 


When several levels can be considered, the selection of the appropriate 
level depends on the information available. It is usually the case that re- 
dundant information is fused at low levels because two pieces of redundant 
information are usually similar in structure. In contrast, complementary 
information is usually fused at higher levels of abstraction, as pieces of 
information are not so similar. For example, in the case of the tank de- 
tection system, data from two radars will be fused at the signal level (low 
level) if both measure the same property at the same time. In contrast, 
the data from a radar and a radio signal detector should be fused at the 
symbol level (high level), as in this case the data gathered by the two 
data suppliers are of a completely different nature and, thus, only the 
elaborated conclusions (for example, whether data seem to indicate the 
presence or absence of a tank) can be combined. Nevertheless, there are 
situations in which two information sources can only be fused at a single 
level. 


1.3.1 Function Construction 


A pivotal consideration in any information fusion system is the actual method 
used for combining information. Its definition is the cornerstone of any inte- 
gration system. Two methods can be distinguished. They roughly correspond 
to a priori and a posteriori analyses of the method’s properties. 
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Definition from properties: The starting point for defining the method is 
a set of properties considered as a requirement for the method. From these 
properties the function is derived using mathematical tools. This is the 
approach used when applying functional equations (see Chapter 3). The 
definition of aggregation as the object that minimizes a given expression 
follows the same idea. 

This approach is formulated as follows: the aggregation of the values 
41, a2, ..., an € D, denoted by C(a1,a2,..., aw), is the object c located 
at the minimum distance of the objects being aggregated. That is, 


C(a1,a2,...,aN) = argmine(M 7 d(c, a;)}, (1.2) 
ai 
where d is a distance defined over D. The approach is valid in any 
domain D where a distance d is defined. Figure 1.5 gives an example of it 
when D is the set of real numbers. 

Heuristic definition: In this case, the function is selected or defined be- 
cause it seems to satisfy user requirements or expectations. The function 

is studied and its properties analyzed later. 


An alternative method has also been proposed for function construction. It 
can be considered as an intermediate approach between a heuristic definition 
and a definition from properties. 


Definition from examples: This manner of definition follows classical sta- 
tistical estimation theory and supervised machine learning methods. The 
function is built as an estimator of some available examples. Therefore, the 
function approximates example outcomes given example inputs. A typical 
method is to use neural networks for such approximations. 


1.4 Goals of Information Fusion 


Now that we have introduced information fusion and outlined some of its 
relevant aspects, we focus on its goals. 

We have said that information fusion deals with all the aspects of the 
fusion process, and its main task is to deal with fusion methods. Due to 
the development of new representation formalisms, the consideration of new 
applications, and the growth of computational power, information fusion is a 
dynamic field, and new methods are constantly being defined. At the same 
time, existing methods are being analyzed to determine their properties. The 
two main goals of the field are (i) formalization of aggregation processes and 
(ii) study of existing methods. The goals are described in more detail below. 


Formalization of the aggregation process: This is to find formal de- 
scriptions for processes (sometimes, intuitive processes) that are used for 
decision making and information fusion. Formal descriptions are needed so 
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Arrow's impossibility theorem applies to aggregation of preferences (over a set of 
alternatives). It proves that when there are at least three alternatives and at least 
two preferences, there is no aggregation function that, for all sets of preferences 
satisfies the following properties: 


1. 
2. 


3. 


4. 


Any preference can be obtained as the result of the function. 

The function does not imply dictatorship (i.e., the function is not just one of 
the preferences). 

The function is monotone, i.e., if one preference is modified so that one alterna- 
tive is promoted, the function should at least avoid demoting such alternative. 
The function satisfies the independence of irrelevant alternatives. That is, the 
final preference of x over y should be independent of preferences for other alter- 
natives. 


Fig. 1.6. Arrow’s impossibility theorem 


that problems can be solved in an effective and sound way. Nevertheless, 

model building (the procedure of building a formal description) is not an 

easy task. For help, the development of methodologies for function selec- 
tion and tools for parameter determination (e.g., algorithms) are required. 

Moreover, in some situations, we need to consider the definition of new 

aggregation operators, as existing methods are not appropriate because 

they do not satisfy the desired properties or, worse, do not fit with the 
current representation formalism in use. The goal can be decomposed as 
follows: 

1. Function definition: The construction of new functions on the basis 
of new properties or when considering new knowledge representation 
formalisms has been studied for a long time. For example, in the frame- 
work of aggregation of preferences (or of alternative selection based on 
preferences), Llull (thirteenth century) and Nicholas of Cusa (Nicholas 
Cusanus) (fifteenth century) proposed methods that were later redis- 
covered by Condorcet and Borda (eighteenth century). They are the 
Condorcet rule (with the Copeland method for solving ties) and the 
Borda count. A related approach, important in real-world applica- 
tions, is to study when no function exists that satisfies a set of prop- 
erties. Arrow’s impossibility (or incompatibility) theorem is a result 
of this kind. We recall that it applies to functions to aggregate pref- 
erences and that Arrow proved that there is no aggregation function 
that satisfies a set of natural axioms. The theorem is reproduced in 
Figure 1.6. 

2. Function selection: This corresponds to methods for deciding the most 
appropriate function in a given situation. At present, this can be done, 
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as pointed out in Section 1.3.1, heuristically, on the basis of properties 
or from examples. 

3. Parameter determination: This stands for algorithms and mechanisms 
for finding the best parameterization of a given aggregation operator. 
Methods are mainly based on expert interviews or are example based. 


Study of existing methods: For most knowledge representation systems 
there exists a large set of aggregation methods that can be applied. To 
apply them properly we need to know their intrinsic differences. Three 
categories can be distinguished in relation to the properties: 

1. Function characterization: This is to know, on the one hand, which 
properties a particular operator satisfies and, on the other hand, which 
operators satisfy a set of properties. Functional equations are basic 
tools for function characterization. 

2. Determination of function’s modeling capabilities: The selection of an 
aggregation function corresponds to a tradeoff between expressivity 
and simplicity. In this respect, we know that aggregation operators can 
be used to build universal approximators (to approximate an arbitrary 
function at the desired level of detail). There exist some general models 
based on quasi-arithmetic means and Choquet integrals. However, to 
use such general models in practice is a difficult task, because on 
the one hand they require a large number of parameters and on the 
other hand they are difficult to interpret. In contrast, the arithmetic 
mean does not use any parameter, while its modeling capability is very 
limited (it corresponds to a completely determined hyperplane). In 
this framework, the determination of a function's modeling capability 
corresponds to locating it in the broad range of operators between the 
arithmetic mean and the general model. 

3. Relationship between operators and parameters: Most aggregation op- 
erators are parametric and, therefore, their behavior strongly depends 
on the parameters. It is important to know how parameters can affect 
the result. For example, to know whether there exists a parameteriza- 
tion that implies the dictatorship property to one of the information 
sources (dictatorship can be represented with the weighted mean but 
not with the OWA operator; see Section 6.1), it is important to know 
how sensitive the operator is to changes in the data (according to pa- 
rameterization) or how much the output is changed when the param- 
eters change (needed when parameters are extracted from examples). 
To help in this analysis, some indices have been defined. Some of them 
(e.g., orness) will be reviewed in Chapter 7. 


In this section, we have given a classification of current goals of information 
fusion and its research. Nevertheless, this classification is not crisp, as there 
are some research topics that can be found across several different areas. One 
of them is parameter determination according to the bias-variance trade-off. 
'This is somehow equivalent to the selection of a model that sufficiently fits the 
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data, but does not overfit it. This requires approaches from function selection, 
parameter determination, and also approaches related to functions’ modeling 
capabilities. 


1.5 Bibliographical Notes 


1. Information fusion: Information fusion and integration is a broad field, 
with applications in several fields of the human knowledge. Due to this, 
aside from its pure mathematical research, work on it is published in jour- 
nals and conferences on a wide range of topics. Our bias, and, therefore, 
the bibliographical references used and consulted for preparing this book, 
is towards artificial intelligence, mathematics, economics, remote sensing, 
and multisensor fusion applications. 

2. Information integration and architectures: The way we have struc- 
tured this chapter and our vision of the field are mainly based on sensor 
fusion and integration. Reference books in this field include [1] and [47]. 
See also the review paper by Hall and Llinas [179]. The chapter by Luo and 
Kay [241] in [1] gives a nice state-of-the-art description (from the 1990s) 
of data fusion and sensor integration. Most of the concepts reviewed can 
be easily translated to other fields, such as artificial intelligence. Several 
reference papers on fusion and related issues (e.g., sensor and data fusion, 
decision making) have been collected by Sadjadi in [345]. 

Differences between information integration and information fusion ex- 
plained in this chapter mainly correspond to the ones in [241], while our 
definition of information fusion is based on [435] and [166]. 

Sensor fusion has devoted much effort to research on architectures. The 
architecture presented here is based on [47], with elements of [241]. In par- 
ticular, the definition of preprocessing as “putting the data in a form that 
is computationally appropriate" is from [47]. Additionally, the difference 
betwen active and passive acquisition can be found in [193]. 

3. Aggregation operators: There is no standard definition of aggregation 
operators. For example, Cauchy [66] and more recently Ovchinnikov [306] 
only require a function returning a value between the minimum and the 
maximum, while [138] and [449] also require symmetry. In this book, we 
follow the first approach initially and then add some consideration on the 
background knowledge later. 

As stated, internality (Equation 1.1) means that an operator leads to a 
value between the minimum and the maximum. This property is used by 
Cauchy (1821) in [66]. Ovchinnikov in [306] refers to operators that satisfy 
this property as compensative functions. [246] refers to them as internal 
functions. 

Additional references on aggregation operators and related topics are 
given in the bibliographical notes of other chapters of this book, specially 
Chapters 4 and 6. 
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Aggregation operators defined as the minimization of a distance (as in 
Equation 1.2 in Section 1.3.1) have been extensively used in the literature. 
For example, Fodor and Roubens in their book [146] (p. 143) use this 
approach to define the aggregation of relations. Similar results focused on 
biology can be found in [36] and [87]. They correspond, respectively, to 
methods for aggregating dendrograms and sequences. See also [141] for a 
recent application of these aggregation methods to bioinformatics. In this 
setting, the resulting aggregation function is known as the median rule. 
The examples given in Figure 1.5 are proved in Gini's book (1958) [163]. 
In particular, the result about the arithmetic mean is proved on p. 168, 
and the one about the median on p. 176. The property concerning the 
plurality rule is given on p. 185. 

Jackson (1921) [201] includes some results for the same problem when 
the distance equals d,(a,b) = |a — b|? for some p > 1. It shows that for 
p » 1 there is a single solution. It also studies the case for p — 1, which 
corresponds to the mode. It shows that there is a unique solution when N 
is either of the form N = 2k+1 or of the form N = 2k with as(k) = as(k+1) 
where s is an order statistics (a permutation such that a;(j < @s(i41)). 
In the case with N = 2k and as(k) # as(k+1), any value a in the interval 
[as(k); @s(k+1)] i$ a valid solution. Nevertheless, the paper also shows that 
the following holds for the limit of p — 1: 


N 
lim arg min 1 |c — ai? è =m, 
pol c 


i=l 


where m is characterized by 


(m — as51)) :: (m — as(k)) = (@s(e41) — m): (Gar) — m). 


From this, it can be shown that for N = 2, m should be m = (a, + a2)/2 
and that for N — 4, m corresponds to 


a443 — 0201 
(a4 + a3) — (a2 + a1). 


Note that the standard definition of median for N = 2k, (ash) -a5(x4.1))/2, 
does not correspond, in general, to this limit (see Definition 6.7). 

The cases with p — 1 and p — 2 (corresponds to the arithmetic mean) 
were already studied by different authors. For example, it was known 
by Laplace [221] (supplement 1812-1818) and Svanberg (attributed) [20] 
p. 194-195 (1821). The case of p — oo corresponds to the midrange of 
{a1,...,anw}. That is, (@s(1) + as(w))/2. Foster (1922) [149] studied the 
case of p — 0, showing that it corresponds to the mode. The case of 
weighted distances was studied in [35] (1938). It leads to the mode (p — 0), 
weighted median (p — 1), weighted mean (p — 2), and, again, the midrange 
for p — oo. 
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Applications and examples: The cited chapter by Luo and Kay [241] 
describes several systems in some detail. They are examples of sensor fu- 
sion. Among them, we underline the example of the tank detection system, 
where fusion is performed at several levels. This example was outlined in 
Section 1.3. The other example in the same section on knowledge elicita- 
tion is taken from [409]. 

Luo and Kay also give an example that corresponds to the indirect fu- 
sion (guiding and cueing) described in Section 1.2. It is the description of a 
robotic object recognition system that uses vision to guide tactile sensing. 
Other examples of aggregation operators for either numerical or ordinal 
scales are given in [43]. In particular, [43] includes a description of the 
Human Development Index and several methods for aggregating grades. 
Some fusion methods in biology are described in [244], [67], and [86]. [244] 
deals with fusion of taxonomies ([318] is an application of such aggrega- 
tion methods for comparing phylogenetic trees), while [67] and [86] deal 
with fusion of sequences. Methods for the aggregation of partitions, also 
used to aggregate nonhierarchical classifications in biology, can be found 
in [143] and [266]. Examples of fusion techniques for computer science can 
be found in [104] and [105]. 

Decision making is described in several books. See [340] for a state-of- 
the-art (1996) description of the field. Other examples briefly pointed out 
in this chapter include plan merging and ensemble methods. Methods for 
plan merging are described in [96, 150]. Ensemble methods are a successful 
technique applied in machine learning and are nowadays described in most 
machine-learning books. See [182] and [436]. 


. Goals of information fusion: Section 1.4 is basically based on our own 


research. Ramon Llull (thirteenth century) findings on electoral systems 
can be found in [234] (Chapter XXIV), [176], and also on a Web page [235]. 
[176] and [235] include English translations as well as transcripts of Llull's 
original works in either Catalan (for example, the novel Blanquerna [234] 
written c. 1283 [369]) or Latin (Artifitium electionis personarum and De 
arte eleccionis). Llull’s election method anticipated Condorcet (eighteenth 
century) (he uses Copeland's method for solving ties). Nicholas of Cusa (or 
Cusanus) introduced an alternative method to Llull’s in 1431 (in his work 
De concordantia catholica) that corresponds to Borda's account. Ramon 
Llull and Cusanus were motivated by a need to find a method for honest 
elections in the Church. 

The papers by McLean [257] and McLean and London [258] are also of 
interest here. They discuss Ramon Llull’s contributions in the context of 
medieval voting, and the influence of Ramon Llull in Cusanus. Chapter 
37 of Book III of De concordantia Catholica by Cusanus is reproduced 
in [257] and [258]. This book was written while Cusanus was attending 
the Council of Basel (1431-1434). [258] argues that the method proposed 
by Llull in Blanquerna corresponds to the Borda count. In this respect, 
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we agree with the later interpretation by Hagele and Pukelsheim [176], 
rather than with the one by McLean and London. 

The papers De arte eleccionis and Artifitium electionis personarum 
were rediscovered, respectively, by Honecker [191] in 1937 and by Perez 
Martinez [320] in 1959. The first work was found in the library of the 
Sankt Nikolaus-Hospital/Cusanusstift in Bernkastel-Kues and seems to 
have been copied by Cusanus himself (see [176] p. 6). 

Arrow’s impossibility theorem was given in [23]. For a history of voting 
procedures, see [426] or [259]. Arrow’s theorem is described in several 
books on preference, choice, and decision. See [332], and also the handbook 
edited by Arrow, Sen, and Suzumura [24]. 

The definition of models based on aggregation operators that are uni- 
versal approximators can be found in [399] and [277, 290, 413]. The former 
work defines a model based on quasi-weighted means and the latters define 
models based on Choquet integrals. 

'The bias-variance tradeoff is described in most machine learning and 
statistical learning books. See [182] and [296]. 
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Basic Notions 


Kai yori hajimeyo' 


Japanese saying 


In this chapter we will review some of the concepts that are needed later in the 
book. In particular, we focus on measurement theory and some basic elements 
of probability theory and fuzzy sets theory. 


2.1 Measurement Theory 


Per levar los molts y notables inconvenients que resiiltan de haver-hi 
diversitat de pesos, midas y mesuras en las ciutats, vilas y locs del 
Principat de Cathalunya y Comtats de Rosselló y Cerdanya, (...) 
statuim e ordenam, (...) que en part alguna de dits Principat y 
Comtats no:s puga tenir, rebre, ni usar altre pes, mida ni mesura, 
sinó la que se usa, y és approbada, en la ciutat de Barcelona (a? 


Corts de Montsó, Chapter 89 (1585), reproduced from [16], p. 99 


A working definition of measurement reads “the process of assigning numbers 
to characteristics of objects or persons according to rules." Then, aggregation 
operators are applied to such measurements so that the numbers are improved. 
As Roberts [334], after Hays [183], points out, one can always perform math- 
ematical operations on numbers (add them, average them, take logarithms, 


! Begin with something closest to you 

? To eliminate the large and remarkable inconveniences caused by the diversity of 
weights, sizes and measures used in the cities, towns and villages of the Princi- 
pality of Catalonia and the Counties of Rosselló and Cerdanya, (...) we enact and 
order (...) that in no part of the mentioned Principality and Counties one could 
have, receive or use other weight, size nor measure different than the one that is 
used, and accepted, in the town of Barcelona 
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and so on). However, the question is whether, after having performed such op- 
erations, one can still deduce true (or, better, meaningful) statements about 
objects. 

'This statement is of special relevance in the field of data fusion, as the way 
in which aggregation operators operate can distort the real meaning of the 
values, and the outcome can be meaningless. Two examples are considered for 
illustrating this process. 


Example 2.1. Let v4 — (1,1) and va — (—1,1) be two vectors to be fused. 
One alternative is to aggregate componentwise (using an arithmetic mean) 
and, thus, define the aggregated vector vc by vc — ((1 — 1)/2, (14- 1)/2). So, 
the outcome is vc — (0,1) using the arithmetic mean. 

Nevertheless, this approach can be useless if v and v2 are the outcome of 
two planning systems for a robot and correspond to the direction the robot 
should take. If outcomes vı and v» are due to the fact that the planners want 
to avoid a collision with an object just in front of the robot (precisely, in the 
direction vc — (0 1)), the aggregation is completely inappropriate. 

An alternative aggregation method that is more appropriate in this case 
is to fuse the angles between the vectors and the (0, 1) vector. 


Example 2.2. Let us consider the values (1, 1, 4, 4, 5) to be aggregated. One 
approach is to apply the arithmetic mean to them. Then, the output would 
be (1+1+4+4+5)/5 = 3. Three alternative scenarios for this computation 
are presented below. 


(i) Let the values (1, 2, 3, 4, 5) correspond to the identifiers of some search 
engines and let the selected values (1, 1, 4, 4, 5) be the best engines with 
respect to the performance, according to five different criteria. In this case, 
the arithmetic mean is not appropriate because it causes the selection of a 
nonoptimal search engine. An alternative approach suitable for this kind 
of problem is majority voting (i.e., select either search engine 1 or 4). 

(ii) Let the values represent grades of satisfaction in the set (very low, low, 
medium, large, very large} (i.e., 1 is very low, 2 is low, and so on). In this 
case, the average seems satisfactory. The outcome would correspond to 
low. 

(ii) Let the values represent grades of satisfaction in the set (low, medium, 
large, very large, optimum}. In this case, a value equal to 1 corresponds to 
low and a value equal to 2 corresponds to medium. The average obtained 
with the arithmetic mean corresponds to large. Nevertheless, if we consider 
that the two values 1 (low) in (1, 1, 4, 4, 5) are compensated by the two 
values 4 (very large), it seems that the outcome should be larger than 
medium. The use of the median operator in the aggregation process would 
lead, with the data given above, to 4 (very large). So, in this context, 
the median seems a more suitable operator than the average. Moreover, 
when ordered categories are considered, the median is sounder than the 
arithmetic mean. 
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These examples show that an essential matter to be considered in infor- 
mation fusion is the kind of operator meaningful in a given domain. This is 
related to Measurement Theory. Measurement Theory gives a sound founda- 
tion of all matters related to measurement and scale. 


2.1.1 Measurement 


Measurement may be regarded as the construction of homomorphisms (or 
scales) from empirical relational structures into numerical relational struc- 
tures. Informally, the empirical relational structures correspond to structures 
found in the real world and the numerical ones correspond to ones in the 
framework we build to measure. 

An example related to Example 2.2 is the Mohs scale of hardness. Hardness 
is a measure of a mineral's resistance to abrasion and it is well known that it 
is defined in terms of the following ten minerals (A :={1. Talc, 2. Gypsum, 3. 
Calcite, 4. Fluorite, 5. Apatite, 6. Orthoclase, 7. Quartz, 8. Topaz, 9. Corun- 
dum, 10. Diamond]). In this set, the ith mineral can scratch all minerals j 
such that j « 7. Then, the hardness of any other mineral is determined from 
the standard set of minerals by checking which ones scratch it. 

In relation to the Mohs scale, the empirical relational structure is the set 
of minerals and their order in the real world with respect to hardness. The 
numerical relational structure is the set of natural numbers in {1, 10} with the 
« relation. Measurement is, thus, the process of constructing the homomor- 
phism between the two relational structures. In other words: measurement 
is the process of assigning numbers that preserve certain conditions (such as 
being able to scratch). 

By the way, it has to be said that it is not always required that homomor- 
phism be defined as a numerical relational structure. Measurement without 
numbers is also possible. One example is the way students are graded in some 
countries ({A, B, C, D, F} in American institutions). 

Formally, a relational structure is a set A with one or more relations R; 
(not necessarily binary) on A and, possibly, some operations o; on A (o; : 
Ax A — A). Structures are expressed by tuples, as in (A, Ri, R2,01). For 
example, in the case of the Mohs scale of hardness, we would have a set A 
and a relation >. That is, R :=>. Therefore, the relational structure is of the 
form: (A, >). Here, a1 > a» for a1,a2 € A holds when a; is harder than ag. 

Another example is the measurement of long objects (i.e., measurement of 
length). In this case, the measurement requires a set (say A), a relation (say 
>), and an operation (say o). Thus, the relational structure is of the form: 
(A, 7,0). In this case, the relation > has a meaning similar to the case above: 
aı > d2 for a1, a2 € A holds when a, is longer than az. The operation o stands 
for the concatenation of long objects; a1 o a5, stands for putting az after a4. 
'The consideration of this operation is appropriate for dealing with addition 
of lengths. 
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Numerical relational structures correspond to the case in which we consider 
a set of numbers (e.g., IR), and, therefore, relations and operations correspond 
to relations and operations over the set of numbers. In the case of the Mohs 
scale, we would have (N10, >), and in the case of the measurement of length, 
we would have (R*, >,+). Here Nio stands for integers between 1 and 10, and 
IR* is the set of positive real numbers. 

Given two relational structures, a homomorphism is a mapping from one 
relational structure to the other in such a way that relations and operations 
are preserved. Formally, given the relational structures 


(A, Ri,..., Rp, 01,---;%s) 


and 
(B, Ri,- Rp o eec oS 


»7s 


9: A— B is a homomorphism from the first structure into the second if, for 
all a1,...,a, € A, the following two conditions hold: 


- Ri(a1,...,a5,) if and only if RZ(ó(a1),...,9(a,,)) for all à = 1,...,r. 
- $(a1 0; a2) = $(a1) o; $(a2) for all à =1,...,5. 


Measurement is based on homomorphism instead of isomorphism because 
the mapping is not usually one-to-one. In other words, $(a1) = $(a2) does 
not imply that a; and a2 are the same. For example, two objects may have 
the same hardness or two students may have the same mark. 


2.1.2 Representation and Uniqueness Theorems 


As has been stated, the purpose of measurement is to establish a homomor- 
phism between the empirical relational structure and the numerical relational 
structure. This homomorphism is built once a set of axioms over the structure 
are established. Then, the representation theorems and uniqueness theorems 
are proved. 

Representation theorems establish the existence of the homomorphism (say 
$) into the numerical relational structure. Uniqueness theorems establish the 
permissible transformations over ¢ that also yield to homomorphisms into the 
same numerical relational structure. For example, in the case of measuring 
length, it is not possible to replace values in R with their logarithms (i.e., 
change lengths a to log(a)) because the addition, +, will not be consistent 
with the o operation. However, multiplying all values by a positive constant 
will keep 4- and o consistent. 

An example of a representation and uniqueness theorem is given below for 
illustration. It corresponds to the establishment of an ordinal scale on finite 
sets. We start with the definition of a relational structure with the set and the 
weak order >. Note that this example relates to the Mohs scale considered 
above. 
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Name Permissible transformations | Examples 
Absolute w(x) =a counting, numbers 
Ratio scale (x) = ax (for a > 0) mass (kg to pounds: v(k) = 0.4536k) 
length (miles to km: (m) = 1.6093m) 
Interval scale w(x) = ax + 8 time (calendar) 
temperature (Celsius/Fahrenheit) 
Ordinal scale (x) such that preferences 
x > y implies w(x) > v(y) Mohs’ scale to measure hardness 
x = y implies p(x) = v(y) 


Nominal w(x) one-to-one subjects in a school 
brands of products 


Table 2.1. Major scale types 


Definition 2.3. Let A be a set and = be a binary relation on A. The relational 
structure (A,7-) is a weak order if and only if, for all a1,a2,a3 € A, the 
following two axioms are satisfied: 


Connectedness: Either a, = ag or a2 = ay 
Transitivity: If a, = a9 and a2 = aa, then a4 = aa. 


Now, we give a representation theorem for this relational structure studied 
by Cantor: 


Theorem 2.4. (Cantor’s representation theorem) Let (A, =) be a weak order 
with A, a finite nonempty set; then, there exists a real-valued function $ on 
A such that, for all a,,a2 € A, 


ay, = ag if and only if d(a1) > (az). 


Therefore, when connectedness and transitivity are satisfied in the rela- 
tional structure (A, =), there exists a homomorphism ¢ into the numerical 
relational structure (R, >). The homomorphism ¢ is said to be an order ho- 
momorphism. 


Theorem 2.5. Let (A, >=) be a weak order with A, a finite nonempty set, and 
let @ be an order homomorphism. Then ¢' is another order homomorphism 
on A if and only if there exists a strictly increasing function f, with domain 
and range equal to IR, such that, for all a € A, 


9 (a) = f(¢(@)). 


This uniqueness theorem establishes that the only valid transformations 
are the ones generated by strictly increasing functions. 
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2.1.3 Uniqueness Theorems and Scale Type 


Uniqueness theorems establish permissible transformations for ø. We say that 
a transformation v is permissible when, applied to $, yields to a homomor- 
phism from the empirical relational structure into the numerical one. For 
example, we have in Theorem 2.5 that strictly increasing functions are the 
only valid permissible transformations for @. 

Table 2.1 displays the most common scales. The name of the scale, permis- 
sible transformations, and some examples are given for each scale. We briefly 
review them here. 


1. Absolute scale (first row in Table 2.1): This corresponds to the case 
where no transformations are possible. Counting and numbers are exam- 
ples of this scale. 

2. Ratio scale (second row): This corresponds to measures in which there is 
an absolute zero and only the unit of measurement can be changed. Thus, 
w(x) = ax for positive a. This is the case for length (meters vs. miles) 
and mass (grams vs. pounds). Note that zero is an absolute value (no 
length or no mass) and that one measure in one unit can be changed into 
a measure in the other unit by multiplying it by a constant. For example, 
1 mile — 1.609344 km and 1 pound — 0.4536 kg. 

In ratio scales, the ratio between two scale values are independent of 
the actual scale used. For example, the ratio of the distances between 
Barcelona and Valéncia (349 km, 216.85855 miles) and Barcelona and 
Tarragona (99 km, 61.515747 miles) does not depend on the unit. In other 
words, $(a1)/$(a2) = w($(ai))/v($(a2)). In the particular case of the 
example considered, this is 349 / 99 = 216.85855 / 61.515747 = 3.5252526. 

3. Interval scale (third row): In this case, affine transformations are allowed 
(i.e., (x) = ax + B). Temperatures are an example of interval scales. 
Fahrenheit temperatures (F) are computed from Centigrade ones (C) by 
F = 1.8C +32. In interval scales, the ratios of intervals are invariant. This 
is formally expressed by 


$(a1)— laz) _ v(9(a1)) — ¥(O(@2)) 


bı) — 6(b2) — v(6(b1)) — v(6(b2)) © 


4. Ordinal scale (fourth row): This example has been considered in Theo- 
rems 2.4 and 2.5. Any monotone increasing function is a permissible trans- 
formation. The Mohs scale of hardness is an example of this scale. Pref- 
erences are also often expressed using ordinal scales. Any ordered set of 
values is equally appropriate to express the ordering. In the case of prefer- 
ences, it is equally valid to use the set {1, 2, 3, 4, 5}, the set {A, B, C, D, E], 
or the set (very low, low, medium, large, very large) to express which al- 
ternative we prefer. In this case, (1) = A, 4(2) = B, ..., (5) = E, or, 
similarly, $' (1) =very low, ¢'(2) =low, ..., 6/(5) =very large. 
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Name Permissible transformations 
Difference scale y(x) =x +6 


Log-interval scale y(x) = ax”, for a, B > 0 


Table 2.2. Other scale types 


5. Nominal scale (fifth row): In this case, any transformation is appropri- 
ate, as numbers do not have any intrinsic value (they do not codify infor- 
mation). Any (numerical/nonnumerical) coding system is appropriate for 
representing nominal scales. Subjects in schools and brands of products 
are examples of nominal scales. 


The consideration of the transformations on scales is of great importance 
in aggregation. This is so because when data is supplied in a given scale, the 
outcome of the method is expected to be in the same scale. In Chapter 4 
(Section 4.3), we will consider again the use of scales and their implications 
in aggregation. 


Other scales 


The scales we have just described are the major scale types found in the 
literature. Other scales have also been defined and used in various applications. 
Some of them are included in Table 2.2. The difference scales might be used 
after logarithmic transformations have been applied to ratio scales. In the 
same way, log-interval scales correspond to exponential transformations of 
interval scales. 


2.2 Probability and Statistics 


It is possible to develop a theory of measure with the countable 
additivity requirement replaced by the weaker condition of finite 
additivity. The disadvantage of doing this is that the resulting 
mathematical equipment is much less powerful. However, a convincing 
physical justification of countable additivity has yet to be given. 


R. B. Ash, p. 6 [26] 


In this section we review some basic facts about probability measures. Such 
measures are the basic tool in probability theory to model randomness. In a 
random experiment the outcome cannot be predicted in advance, as different 
executions of the same experiment often lead to different outcomes. Dice and 
coins are classical examples, as each time we toss them we might get a different 
output. 
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Probability theory is the mathematical theory to model these kinds of 
situations. Probability measures assign values to possible outcomes. Formally, 
probability measures are defined taking into account the set of all possible 
outcomes of the process, i.e., the state space, the sample space, or the reference 
set, often denoted by X, (2, or X. In this book, we will use X. 

When tossing a coin, we have X = (head, tail}, and when throwing a die, 
we have X = {1,2,3,4,5,6}. In these examples, the space X is finite. Nev- 
ertheless, in general, X can be infinite, and either countable or uncountable. 
For example, we can consider as X the set of integers or the [0, 1] interval. 

Another basic concept is event. This corresponds to a property that can 
be checked after an experiment has been done. For example, after tossing the 
coin we can check whether it is tail or not, and after throwing the die we can 
determine whether it has an odd number or not. Formally, an event is a subset 
of the set X. 


Definition 2.6. In probability theory, we consider the concepts of state space 
and event: 


1. The state space or sample space is the set of all possible outcomes. 
2. An event is a subset of the state space. 


Probability measures are functions that assign a number to an event. Given 
an event A (A C X), the value P(A) measures the likelihood of the event A 
before performing the experiment. It is well known that the higher P(A), the 
higher the likelihood that A occurs. As events are subsets of X, probability 
measures are set functions. 

When finite sets X are considered, probability measures can be defined 
on all subsets of X. That is, P is a function on the set (X) into [0,1]. 
Nevertheless, in general, it is not possible to consider all subsets of X. This 
is the case, for example, when X is not finite. In such situation, measures 
are defined over o-algebras. They are subsets A of p(X) with some particular 
properties. These properties and the definitions of algebra and o-algebra are 
recalled below. 


Definition 2.7. Let X be a reference set, and let A be a subset of p(X). Let 
us consider the following properties: 


Property 1:0 € .A and X € A 
Property 2: if A € A then X\AEA 
Property 3: A is closed under finite unions and finite intersections: 


if A1,..., An € A, then UL, A; € A and NL, A; € A, 
Property 4: A is closed under countable unions and intersections: 
if A1, A2,- E€ A, then UP, A; € A and NZ, Ai € A, 


1. A is an algebra (or a field) if it satisfies Properties 1, 2, and 8. 
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2. A is a c-algebra (or ao-field) if it satisfies Properties 1, 2, and 4. 


Note that Properties 1 and 4 imply Property 3. Therefore, any c-algebra 
is an algebra. Nevertheless, the reverse is not true. 

When A is defined as A :— p(X), we have that A is a c-algebra, and, there- 
fore, A is also an algebra. In general, when A is an algebra on the reference 
set X, a pair (X, A) is known as a measurable space. Therefore, (X, o(.X)) is 
a measurable space. 

Now, let us define probability measures. 


Definition 2.8. Let X be a reference set and let A be a a-algebra on X; then, 
a set function P is a probability measure if it satisfies the following conditions: 


(i) P(A) > 0 for all A € A, 

(ii) P(X) — 1, and 

(ii) P(US, Ai) = O72, P(Ai) for every countable sequence A; (i > 1) of A 
that is pairwise disjoint (i.e., Ai A; = 0 when i £ j). 


Condition (iii) in this definition is known as countable additivity. The 
axiom (iii) given below might be used to replace Condition (iii) when X is 
finite. This alternative condition is known as the finite additivity: 


(iii) P(AU B) = P(A) + P(B) for all A, B € A when AN B =b. 


Nevertheless, in general, (iii) and (iii) are not equivalent, as (iii) only 
implies (iii) for a finite number of pairwise disjoint sets A;, Aj. That is, (iii) 
implies the following equality instead of implying (iii): 


(iii) P(UT ,A;) = 35. P(A;) for every countable sequence A; (i > 1) of A 
that is pairwise disjoint (i.e., A4; N A; = Ø when i Æ Jj). 


So, countable additivity is not implied by finite additivity. For the purpose 
of this book, the difference is not very relevant, as we focus on finite sets. 
Therefore, in practice, condition (iii) suffices. 

Conditions (i), (ii) and (iii") on an algebra correspond to the Kolmogorov 
axioms. 

The following properties can be deduced for probability measures from 
Definition 2.8. 


Proposition 2.9. Let (X, A) be a measurable space, and let P be a probability 
measure. Then, for all A, A1,..., An, B in A, the following holds: 


1. P(0) =0 
2.P(A)x1 
3. P is additive 


4. P(A) < P(B) if AC B 

5. P(B\ A) = P(B)- P(A) if AC B 

6. P(X \ A) =1— P(A) 

7. P(A, U---U An) € P(A1) +--+ P(An) for A; C X 
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Let S be subsets of X; then, the c-algebra generated by S is the set of subsets of 
X that 


1. contains S 
2. is a o-algebra 
3. is as small as possible 


This o-algebra generated by S is denoted by o(S). 
Let O be the open subsets of R; then, B = a(Q) is the Borel o-algebra of R. 


Fig. 2.1. Definition of Borel o-algebra 


8. P(AU B) = P(A) + P(B) - P(An B) 
9. P(U8, Ai) = Y P(A) — Die; PU Aj) + Xue P(Ain Ay Ax) + 
e (-1)FD P(A, n-- 0 An) 


The last equality is known as the inclusion-exclusion formula. 


Taking into account what has been explained above, a model for a random 
experiment is defined in terms of the reference set X, the algebra A, and the 
probability measure P. This is commonly referred as a probabilistic space, 
and is denoted by (X,.A, P). 


2.2.1 Random Variables 


A random variable is a function that assigns values to the outcomes in the 
reference set. That is, a random variable is a function from X into a space S. 
For example, if X is the outcome of throwing two dice, then we can consider 
the random variable f that assigns to each outcome the sum of the values 
obtained by the two dice. Thus, in this case, we have X = ((1,1), (1,2), ..., 
(6,5), (6,6)} and f((a,b)) :— a +b. 

Given a probability space (X, A, P) and a random variable f : X — S, we 
can define a new probability measure on the space S from P. This is shown 
below. Note that in this definition we need a o-algebra S on the space S, as 
the new probability should be defined over subsets of S. 

Often, as S is the set of real numbers, we define the o-algebra S to be the 
Borel c-algebra. This algebra (defined in Figure 2.1) contains all subintervals 
in $. 


Definition 2.10. Let (X,.A, P) be a probability space and let f : X —^ S bea 
random variable; then, the probability measure Py induced by f on a o-algebra 
S on S is defined as 


P(A) := P({w: f(w) € Awe X]) 
for all A in S. 
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fr 





Fig. 2.2. Cumulative distribution function as the integral of a probability density 
function 


This probability is also known as distribution of the random variable f or as 
the law of f. The probability Pr can be alternatively expressed by P(f~1(A)) 
or P(f € A), where f~+(A) is the inverse of f and corresponds to the set 
{w : f(w) € A,w € X}. 


Definition 2.11. Let (X, A, P) be a probability space and let f : X — S be 
a random variable. Then, the cumulative distribution function of the random 
variable f is defined: 


G(r) = Pa : f(z) < r}) 
or, using the probability measure Py on (S, S), by 
G(r) :— Pg((—oo, r]) 


'This function is often referred to as distribution function, probability dis- 
tribution or cdf (for cumulative distribution function). 
When the cumulative distribution function Gs can be expressed by 


Gy (r) E gy (s)ds 


— oo 


for all r in R, we say that gy is the probability density function (or pdf), or, 
simply, density of the random variable f. Figure 2.2 represents this expression. 


2.2.2 Expectation and Moments 
If f is a random variable of X, then the following definitions are of interest: 


Definition 2.12. Let (X, A, P) be a probability space and let f : X — S bea 
random variable. Then, the expectation of f is defined by 


E[f] = f faP 


provided the integral exists. 
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When a random variable f takes only a finite number of values, it is said 
that f is simple. In this case, if f takes values (r1,...,r.], the expectation is 


Elf) 2 3 ri P(Uf = 0) 
i=1 
When f is a random variable on (X,.A, P), the following expectations are 
often considered for k > 0 


1. E[f*], the kth moment of f. 
2. E[|f|*], the kth absolute moment of f. 
3. E[(f — E(f])^], the kth central moment of f. 


These moments are only defined when F[f] is finite. 

The mean (denoted by mean(f) or f) and the variance (denoted by o?(f), 
oF, or Var(f)) are terms often used to refer to the first moment of E[f] and 
to the second central moment of f. The positive square root of the variance is 
known as the standard deviation of f (denoted by ce). That is, mean(f) := E|f] 
and c?(f) := E((f — E(f]Y?]. 


Proposition 2.13. If E[f?] is finite, then E[|f|] is also finite, and we have 
o*(f) = E(f?] - Eff. 
Therefore, the following equation also holds: 


E(fI < E’). 


2.2.3 Independence 


An important concept that appears in probability theory is independence. 
We review some definitions and results in this section. The main idea behind 
independence is that when two events are independent, some knowledge about 
the occurrence (or nonoccurrence) of one of the events does not change the 
odds of the occurrence of the other. 

Let (X, A, P) be a probability space. Then, if we consider the event B, we 
have that its probability will be P(B). Next, let us consider the event AN B. 
Then, we have that when A and B are independent, the probability that A 
occurs does not depend on the occurrence of B. Informally, this means that 
the proportion of A in X is the same as the proportion of A in B. This is 
established by the equality P(A)/P(X) = P(An B)/P(B). 

In mathematical terms, independence is defined as follows: 


Definition 2.14. Two events are independent if and only if P(A N B) = 
P(A): P(B) 
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If two events are independent, knowing something about the occurrence 
(or nonoccurrence) of one of the events does not change the probability of 
the other. When independence does not hold, it is relevant to consider the 
conditional probability to measure the change of a probability. T'he conditional 
probability of A relative to B is defined as 


P(An B) 


P(AIB) = Sa 


(2.1) 


provided P(B) > 0. 
We consider now a few results for independent events. 


Theorem 2.15. Let P be a probability on a probability space (X,.A, P). Then, 
the following holds: 


1. If P(A) » 0, then A and B are independent events if and only if 
P(A|B) = P(A). 


2. If, for arbitrary events A1, Ao,..., An, we have P(AyNA2N::-NAn) > 0, 
then the following holds: 


P(A,NAgN- A4) = P(A1)P(Aa|41)P(Ag|42141)-.. P(An| An a A). 


3. Let A1,..., An be a partition of the sample space (A; A; — 0 fori # j, 
and UA; = X); then, for any B C X, we have 


P(B) = X` P(A)P(B|A;). 


i=l 


4. Let Ay,...,An be a partition of the sample space; then, for any B C X 
such that P(B) > 0, we have 


|.  P(A)P(B|A) 
PUB) = S PUPA 


The last property is Bayes’ theorem. 


Now we turn to the independence of random variables. Let (X, X, P) be 
a probability space and let fi,..., fn be random variables into (S, S); then, 
independence means that knowledge about one or more f; does not change the 
probability of the other. Knowledge about a random variable means knowledge 
about one event of the form A; = (f; € B}, where B is in S. Therefore, inde- 
pendence of the random variables fi,..., fn means that the events A1,..., An 
should be independent. 

'This is formally defined as follows: 
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Definition 2.16. Let (X,.A, P) be a probability space and let fi,...,fn be 
random variables into (S,S); then, fi,...,fn are independent if and only if, 
for all sets B1,..., Bn € S, we have 


PHJ € By... fa € B) = Pfi € Bj): PU fn € Bn}). 


The following results can be proved for independent variables. 


Theorem 2.17. Let fi,..., fn be random variables on (X,.A, P). Let Gi be 
the distribution functions of f; for à — 1,...,n, and let G be the distribution 
function of f = (fiy... fn}. Then, fy,..., fn are independent if and only if 


G(z1,..., £n) = Gi(21) : Go(x2) See Se G, (x4) 
for all real z1,..., £n. 


Corollary 2.18. If fi,..., fn are independent and f; has density gi for i = 
1,...,n, then f has density g given by 


g(xi, v) = g1(#1) +: gn). 


Theorem 2.19. Let fi,..., fn be independent random variables on (X, A, P). 
Then, if E|f;] is finite for all à =1,...,n, Elfi,..., fn] exists and the follow- 
ing equation holds: 


Elfi,---> fn] = Elfil - Elf] ----- El fn]. 


When random variables are not independent, it is meaningful to consider 
their covariance and correlation coefficients. The correlation coefficient (or 
Pearson’s correlation coefficient) is also known as the product moment corre- 
lation. We define it below. 


Definition 2.20. Let fı, f2 be two random variables with finite expectation, 
and assume E|f1 fo] is finite. Then, the covariance of fı and fz is defined by 


Cov(fi, fe) :2 El(fi — ELR]) (fe — Elfel)] = El fife] - ELA) Elf] 


Given two sets of random variables f = {fi,..., fs} and F = (fr... fih 
the matrix defined by 


Cov(fi, fi) Cou(fi, fa) ++» Cov(fi, ft) 
Cov(fa, fi) Cov(fa, fa) +++ Cov(fa, ft) 


Cov(f, f") = | | 
Cow(f,, f1) Covl fe, f1) = Covf., ff) 
is the covariance matriz. For f' = {f}, this definition generalizes the variance: 
Var(f) = Cov(f, f). 
Note that o?(f) = Var(f) = E((f — EU]? 
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If fı and f2 are independent, then the covariance is zero. Nevertheless, the 
converse is not true. For example, fi = cos(0) and fa = sin(0), where 0 is 
uniformly distributed between 0 and 27, has covariance zero. 


Definition 2.21. Let fı and fz be two random variables such that o?(f1) and 
c? (f2) are finite and greater than zero; then, the correlation coefficient between 
fi and f2 is defined by 


D Cov(fi, f2) 
MP) Sola) 
Equivalently, we have 
2 ze E Elf- El) 
ey fe) = ges = ENIE- ECT 


Proposition 2.22. The correlation coefficient satisfies the following proper- 
ties: 


1. —1 < p(fı, f2) <1 

2. |p(fi, f2)| = 1 if and only if fi = fi — Elfi] and f; = f2 — E|f] are 
linearly independent. That is, if P({afi + bf} = 0}) = 1 for some real 
numbers a, and b, not both zero. 


So, in general, the nearer p is to —1 or to 1, the more linear is the model. 


2.2.4 Parametric Models and Nonparametric Methods 


Tout le monde y croit cependant, me disait un jour M. Lippmann, car les 
expérimentateurs s’imaginent que c'est un théorème de mathématiques, 
et les mathématiciens que c'est un fait expérimental? 


H. Poincaré, [324] 


Given a set of observations, parametric models permit us to represent data in 
a compact way. Once a particular model is properly selected, a huge amount of 
data can be reduced into the few parameters that the model requires. Normal 
(either univariate or multivariate normal) distributions and x? are examples 
of parametric models. Then, parametric techniques rely on the properties 
of parametric distributions. Alternatively, nonparametric methods have been 
developed so that they can be applied when no parametric model can be made 
to fit the data. So, these methods do not require strong assumptions on the 
data distribution. 

Normal distribution is one of the parametric models. In it, a whole data set 
is reduced to two values: mean and variance. Normal distributions are defined 
in the next example. 


3 Everyone believes in it - Mr. Lippmann said to me one day — because the exper- 
imenters think it is a theorem of mathematics, and the mathematicians think it 
is an experimental fact. 
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Example 2.23. A random variable f follows a (univariate) normal distribution 
with mean u and variance c? if the probability density function of x for 
x € (—00,00) is of the form 


1 


e 360-0 o? 
oV 2T 





gf(x) = 


Example 2.24. A random variable f follows a multivariate normal distribution 
of dimension n with mean vector jj and variance-covariance matrix X if the 
probability density function of x is of the form 


1 


= =3 (x-4) ET! (x-y) 
grx) = Or PE" : ! 


where |3:| denotes the determinant of the variance-covariance matrix X. 


Note that, in the multivariate case, jj, x € R” for a given n. 
When f follows a normal distribution, we write x ~ N(u,o?) or x ~ 
N(p, X). 


2.2.5 Regression 


Pour cet effet, la méthode qui me paroit la plus simple et la plus 
générale, consiste à rendre minimum la somme des quarrés des 
erreurs. On obtient ainsi autant d'équations qu'il y a de coéfficiens 
inconnus; ce qui achève de déterminer tous les élémens de l'orbite.* 


Legendre, A. M., p. viii [224] 


In this section we present an overview of regression, in which we construct a 
model of one variable in terms of some other variables. We restrict ourselves 
to the case of linear models. So, the outcome is a linear combination of the 
input variables. 

Let us start considering a simple model concerning two random variables 
fi and f2. In this case, if fz is expressable in terms of fı following a linear 
model 6o +61 fı, we have that the following equation holds when there is some 
error involved in the process: 


f2 = Bo + fifi +e. 


Here, fı is the explanatory variable (or carrier, regressor or the indepen- 
dent variable), f2 is the response variable (or the dependent variable), e is the 
error (in fact, another random variable), and jo and £1 are the parameters of 
the model. 


^ For this effect, it seems to me that the simplest and more general method consists 
of making minimum the sum of the squares of the errors. We obtain in this way 
as many equations as unknown coefficients; which serves to determine all the 
elements of the orbit. 
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fij 


Rə = bo + Bfü 





Rı 


Fig. 2.3. Graphical representation of least sum of squares of the vertical distances 


Index|| 2; Yi 
[Year] Export revenues 
59 
111 
314 
653 
903 
1209 
1425 


NOoKRWNER 





Table 2.3. Japanese annual sales revenue from exports to U.S., in hundred million 
dollars 


Now, given some particular data (such as in Table 2.3), we might be able 
to estimate £g and ( so that the model fits the data. This is classically 
solved by minimizing the sum of the squares of the vertical distances (see 
Figure 2.3). This method is known as least sum of squares (LSS) (or least 
squares). Formally, we consider the error between the observed value of fo 
and the linear model computed for fı: 


fo — (bo + Gift). 


Thus, when the particular data for variables fı and f2 corresponds to the 
pairs ((z;, yi) Hi. we consider minimization of the following error: 


TL 


ERR(6o, 01) = » vi — (Bo + iz)? (2.2) 


i=l 
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To solve this problem, we consider the partial derivatives of the expres- 
sion ERR with respect to the two parameters £o and £1, and then set these 
derivatives to zero. That is, 


fa) n 
DUM =25 (yi — Bo — fix) (-1) =0 


i=l 


and 
3 n 


i=1 
The solution of these expressions is: 


A= EX IN z2) ES 33:8 zi)? /n EE Y EL , (2.3) 


Bo = g — fiz. (2.4) 
As the determinant of the matrix of second-order partial derivatives of ERR 


is positive, the solutions correspond to the minimum of ERR. Go and 04 are 
estimators of £g and f, respectively. 


Example 2.25. Let us consider the data in Table 2.3. This data is represented 
in Figure 2.4. It can be observed that this data follows a linear model, except 
for the first observation. 

Accordingly, we consider the regression using a linear model of the form 


revenue = bo + Biyear. 


Using all observations in Table 2.3, the linear regression model we obtain 
(using Expressions 2.4 and 2.3) has the following estimates for o and (31: 


e (y = —96923.393 
e (j, — 49.164 


So, the model is 
revenue = —96923.393 + 49.164year. 


In contrast, when the first observation is not included in the computations, 
we obtain the following model: 


revenue = —107180.476 + 54.314year. 


The models are represented in Figure 2.4, together with the original observa- 
tions. It can be seen that the second model is more adjusted to the original 
data while the first one is slightly displaced to accommodate the first obser- 
vation. 
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(a) (b) 





Fig. 2.4. Representation of the data in Table 2.3 and regression models: (a) regres- 
sion using all observations; (b) regression using all observations except the first one, 
corresponding to the year 1970. 


In a more general case, we can consider r explanatory variables (X4, ..., Xr}, 
and one response variable Y. In this case, when a linear model is considered, 
we have 


Y = Bo + Ó4X4 +-+ BL X, €. (2.5) 


When n observations of the form ((zi1,...,2i;^, yi)} are given, we should 
consider n equations of the form: 





yi = Bo + bitir t o + BpLir + éi 


for i =1,...,n, which, considering rjj = 0 for all i in 1,...,n, can be put in 
matrix form as follows: 


yi X10 211 tt Tir Bo €1 
y2 T20 221 *** Lar i €2 
= : A ; : + 
Br 
Yn In0 Uni*** Lnr En 
That is, 
Y-X-e.. 


As with the simplest case of one explanatory variable, in the general case 
we also use the least sum of squares. Again, this is to minimize the error with 
respect to 8. In matrix form, this is expressed as the minimization of the 
following expression: 
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Let X be a matrix with dimensions m x n; then, its generalized inverse X~ is another 
matrix with dimensions n x m such that the following equation holds: 


XX X-—xX 


For each X, there is at least one generalized inverse, but it is not necessarily unique. 
One method for computing X ^ when X is a symmetric square matrix of dimension 
n and rank r is as follows: 


(i) Take X and delete n — r rows and the corresponding columns to obtain an r x r 
nonsingular principal matrix. Let M = (mij) denote the new matrix. Note that 
M exists because the rank of X is r. 

(ii) Invert M and obtain M ^!. Let us denote the elements in M^! by mij. 

(iii) Take X again and replace the elements mi; of M by the elements mj; in M ci 
Replace the other elements of X not in M by zero. The matrix obtained is a 
generalized inverse of X. 


Fig. 2.5. Generalized inverse of a matrix X 


ERR(8) = ||Y — XAI}. 


When the columns of X are linearly independent, there exists a unique 
vector 3 that minimizes this error (ie., there is a unique solution of the 
minimization problem). This solution is 


B-(X'X)-1x'Y. 

When the columns of X are not linearly independent, a solution is given 

by 

B -(X'X)-X'Y. 
Here, (X'X)- is a generalized inverse of (X'X). The computation of a gen- 
eralized inverse is given in Figure 2.5. 

Once a model is built, it is necessary to evaluate to what extent the model 
fits the data, i.e., to measure the goodness of fit of the model. This can be done 
using p?, i.e., the square of the correlation coefficient between the observed 
variable Y and the predicted Y . This is formally defined by (see Definition 2.21 
for details) 


p ws 9: P 
ilu — 9)? UG — 9)? 
An equivalent expression can be used for p?. It is defined in terms of the 
sum of squares due to regression SSR (SSR = (6 — j)?) and the total sum 
of squares SST (SST = X` (y — y)”). This expression is as follows: 


(2.6) 
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SST Xy- y)?’ l 
with Yi = Bo + Bizti. 
Note that the equivalence of Expressions 2.6 and 2.7 uses the fact that we 
are considering a linear model (represented in Equation 2.5). 
Denoting the sum of squares of errors by SSE (i.e., SSE = Y (yi — $)?), 
we have that the following equation holds: 


SST = SSE + SSR. 
That is, 


Su- =X u- +A - 9». 

Here, as $` (y — y)? measures the variability in Y, Expression 2.7 can be 
interpreted as the percentage of the variability in Y that is explained by the 
model. Naturally, the higher the percentage, the better the model. Recall that 
p? is between 0 and 1. 


Example 2.26. The goodness of fit of the regression models in Example 2.25 
are as follows: 


e p? = 0.97937 (regression using all observations) 
e p? = 0.99637 (regression using all observations except the first one) 


The linear model developed so far is based on several assumptions. They 
are highlighted below: 


1. The linear model is an adequate approximation. That is, Equation 2.5 is 
appropriate to represent Y as a function of X. 

2. The variance of the error e; is independent of the observation. That is, 
Var(e;) = K for all i. 

3. The errors are uncorrelated: e; is correlated neither with ej for i Z j nor 
with X. 

4. € should follow a normal distribution with mean equal to zero. 


Other regression methods have been developed for situations in which 
these assumptions do not hold. For example, weigthed regression can be ap- 
plied when the variance is not constant for all e;. Robust regression methods 
(see Section 2.2.8) are some other tools available for these situations. 


2.2.6 Robust Statistics 


The point of robust statistics is that one 
may keep a parametric model although 
the latter is known to be wrong 


Hampel et al., p. 403 [181] 
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Fig. 2.6. Representation of the data in Table 2.3, and the same data with some 
perturbation: (a) data and regression using the original observations; (b) data and 
regression with a perturbation on the x-axis (1995 changed into 1965 for observation 
6); (c) data and regression with a perturbation on the y-axis (59 changed into 1429 
in observation 1); (d) data and regression with a perturbation on the x- and y-axis 
(pair (1970, 59) changed into (1999, 1400) in observation 1). The corresponding data 
sets are presented in Table 2.4 


'The actual practice of statistical methods with real data poses several prob- 
lems. An important one is due to the fact that data usually contain errors and 
do not always fit the assumed data model. The differences between model and 
real data can cause the results obtained to deviate from the ones that would 
be obtained if errors were not present, i.e., to deviate from the results obtained 
in an ideal situation. Robust statistics has been developed to deal with this 
situation. 

Differences between model and real data can be due to different reasons. 
Some of them are the following: 
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Table 2.4. Japanese annual sales revenue from exports to U.S., in hundred million 
dollars: (a) original data; (b) data with error in the year; (c) data with error in the 
revenues; (d) data with errors in both year and revenue 


e Errors in the data, either intentional or accidental. Intentional errors in- 
clude rounding (or grouping) or censoring (e.g., applying masking methods 
for data protection). Some of the errors can be hidden in the data while 
others can cause important damage to the conclusions of an analysis. Ta- 
ble 2.4 and Figure 2.6 represent four data sets, an original one (considered 
in Example 2.25), and three others obtained from it with some perturba- 
tions on the x and/or y axes. The figure also represents the corresponding 
linear regression model using the least sum of squares method. It can be 
observed that the perturbation provokes changes in the linear model. In 
fact, extreme perturbation of a single value might even change the sign of 
the £4 coefficient. 

e Assumptions are violated; some of the assumptions of the model do not fit. 
For example, assumptions on independence do not hold for some variables. 


Tools have been developed so that data diverting from the model does not 
affect the result in a significant way. In other words, robust statistics develops 
procedures in a way that their behavior in the neighborhood of parametric 
models is similar to the one obtained with the data completely fulfilling the 
model. So, in the case of the data represented in Figure 2.6, the goal would 
be to obtain in all cases a result as similar as possible to the one in Figure 2.4 
(a). 

In order to characterize to what extent a procedure is robust, some con- 
cepts have been defined. The breakdown point is one of these concepts. In- 
formally, the breakdown point corresponds to the minimal proportion of bad 
data that can ruin the output of the procedure. 

As an example we can consider the mean and the median of a sample 
(21,..., zw). If we replace any of the x; for x}, it is clear that for x; — oo 
the mean (5 //;; £j + x;)/N is unbounded while the median is bounded. 
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As will be seen later, the influence functions and the breakdown point of 
mean and median show that the latter is more robust. We review below some 
concepts related to robust statistics. 

First, we consider the influence function. This notion, which is a local 
measure, stands for a measure of the influence of infinitessimal perturbations 
on a statistic. This notion is formalized as follows. 


Definition 2.27. Let G be a probability distribution and let T be an estimator; 
then, the influence function (IF) of the estimator T at the distribution G is 
given by 


IF (0;T,G) = lim T( = 6)G + Ac) - T(G) 
er € 
in the x where this limit exists. 


A, denotes the probability distribution that puts all its mass in x. 


The influence function is a linear approximation of the estimator for a 
distribution contaminated by amount e. That is, 


T((1— €)G + eAs) ez T(G) + IF (x; T, G). 
For finite samples, several alternative expressions exist. The sensitivity 


curve and the empirical influence functions are two of the existing expressions. 


Definition 2.28. Let S = [zi,...,24 1] be a sample and let Ta forn > 1 
be an estimator; then, the sensitivity curve of T for the sample S at x is 


SC(x; Tn, S) := n(T, (1, o4 @n—1, L) — Tr-1(21,... ,£n—1)) 


Definition 2.29. Let S = [zi,...,24 1] be a sample and let Ta forn > 1 
be an estimator; then, the empirical influence function of T for the sample S 
at x is 

EIF(x; Tn, S) := Ty(21,..., 2043, x). 


Example 2.30. Let S = {0.5,0.25,0.8,0.75}; then, the EIF of the arithmetic 
mean (AM) at the sample S is 





.5 + 0.25 + 0.8 + 0. | 
HE GAM Spe? 0.25 = 0.75 z 
The EIF of the median at the sample S is 


0.5 ifa <0.5 
EIF(a;median,S)= 4 x if0.5< x< 0.75 
0.75 if 0.75 > x. 


So, the EIF of the mean is unbounded while that of the median is bounded. 
Note that the presence of an erroneous measurement affects the outcome of 
the median as it does in the case of the mean: in both cases the outcome shifts 
in the direction of the error. Nevertheless, the influence of the error is limited 
in the case of the median, but not in the case of the mean, as EIF indicates. 
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IF (x; AM,G) 









NIA 


IF (x; median, G) 





Fig. 2.7. Influence functions (IF) for the arithmetic mean (AM) and the median 


The influence function can be used for computing the gross-error sensitivity 
and the local-shift sensitivity. They are defined as follows: 


Definition 2.31. Let G be a probability distribution and let T be an estimator. 
Then, 


1. The gross-error sensitivity of T at G is defined as 
Y (7, G) := supz|IF (x; T, G)]. 


x is taken where IF(x;T,G) exists 
2. The local-shift sensitivity of T at G is defined by 


[fF (y; T, G) - IF(55 T,G)] 


A* (T, G) := supszy TE 


Here, the gross-error sensitivity gives information about the worst case 
situation. It measures the largest influence as computed using IF. In robust 
procedures, this sensitivity is expected to be finite. This is not the case for 
the arithmetic mean. 

The local-shift sensitivity measures the influence of replacing the observa- 
tion x by some y. 


Example 2.32. Let G be the normal distribution N(0, 1); then, the influence 
functions of the arithmetic mean (AM) and the median at the distribution G 
correspond to the following expressions (see Figure 2.7): 


IF(x; AM,G) =x 
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IF(r;median,G) = { 2e 7 x : : 
T if x 


Their gross-error sensitivity is 


e y'(AM,G) =œ 
e ~*(median, G) = /7/2 


and the local-shift sensitivity corresponds to 
e A*(AM,G)=1 
e r*(median, G) = oo 

Note that A* (AM, G) = 1 because IF (x; AM, G) = x, and thus we have 
A*(AM,G) = supzzy Het, In contrast, in the case of the median, we have, 


for x < 0, IF (x; median, G) = —,/7/2, but, for x > 0, IF (x; median, G) = 
7/2. 





The concepts defined above are local ones, as they only refer to the varia- 
tion of the output for a single observation. In contrast, the breakdown point 
is a global concept, as it takes into account the effect of changes in the whole 
distribution. More specifically, the breakdown establishes the fraction of the 
data that can cause the estimator to lead to a meaningless result. There exist 
several definitions for the breakdown point on finite samples. One of them is 
given below. 


Definition 2.33. Let S = (z1,...,24] be a sample, and let Tna be an estima- 
tor; then, the breakdoum point of the estimator T' at the sample S is defined 
as 


1 
e (T, S) := a min{m; bias(m; T, S) = co}, 
with bias(m; T, S) defined by 


bias(m;T,S):= sup ||T(S") — T(S)||, 
S'ER(S,m) 
where R(S,m) represents all samples obtained from S with m original obser- 
vations replaced by arbitrary values. 


Note that, here, bias(m; T, S) < oo means that the effect of m pertur- 
bations is bounded, while bias(m;T,S) = oo means that it is not. There- 
fore, ež (T, S) corresponds to the smallest fraction of contamination with un- 
bounded effect. Naturally, the larger the breakdown point, the better. 

In this definition, the smallest possible breakdown point is 1/n, and the 
largest one is around 1/2. In fact, the breakdown point of the arithmetic mean 
is 1/n and the one of the median is 50%. 
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2.2.7 M- and L-Estimators 


Robust statistics has studied several families of estimators. Here, we review 
two families: M-estimators and L-estimators. The first take their name from 
generalized maximum likelihood and the second from linear combination of 
order statistics. 


Definition 2.34. Given a sample zij,...,x4, an estimator T, is an M- 
estimator if Tn is the minimum of 


TL 


5 p(xi, Tn) 


i=1 
for an arbitrary function p. 
The following is an alternative definition. 


Definition 2.35. Let ó be equal to ó(x, O) = (0/0O0)p(x,O) when p has a 
derivative. Then, Ty, is an M-estimator if it satisfies 


N dns Ta) =0 
tal 


Next, we define L-estimators. 


Definition 2.36. Given a sample £1,..., £n, an estimator Tn is an L-estimator 
if T4 is of the form 


Ta (a1, AGE, En) = yo CiT s(4); 
i=1 


where zs(i),...,Xs(n) is the ordered sample (e.g., xs() < ts(2) < +++ < ts(n)) 
and the c; are coefficients. As x,(; corresponds to the ith order statistic, Tn 
is a linear combination of order statistics. 


Recall how the order statistics are defined. 


Definition 2.37. Let i be an index i € (1,..., N}; then, a mapping OS: 
RN — R is the ith order statistic of dimension N if and only if 


OS; (a1,...,aN) = 9s(i), 


where s is a permutation of {1,...,N} such that a,( X as;(4,1) for i € 
IE Neq. 
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2.2.8 Robust Regression 


Standard regression using the least sum of squares has a breakpoint of 1/n. 
Therefore, the model is highly affected by errors in the data. The goal of 
robust regression is to build models that are resilient to errors in the data, so 
that the regression model obtained from data with errors is not much different 
from the one obtained without erroneous data. To illustrate, let us consider 
again the data in Figure 2.4. A robust regression method would obtain for all 
data sets a model similar to the one of Figure 2.4 (a). 

We consider below two methods for robust regression: the Least Median 
of Squares (LMS) and the Least Trimmed Squares (LTS). 

Given the set of observations { (x4, yi) }i=1,....n, we have that LMS corre- 
sponds to finding the parameters 3 = (90,1) that minimize the following 
expression: 


ERRLM S(fo, B1) :— rs(n), 


where r; = |y: — (8o + &12;)|, and rsj) is the jth element in r; when the 
residuals are ordered in increasing order. That is, rs) € Ts(2) € ... € s(n): 
When h = n/2, this approach corresponds to Equation 2.2, replacing the sum 
by the median. Note that, as seen in Example 2.32, the median is more robust 
than the arithmetic mean. 

LTS corresponds to finding the parameters 6 = (009/01) that minimize 


ERRLTS(6o, (1) := 3 ^ ds, (2.8) 


j=1 


where d; = |y: — (Bo + &12;)|, and d;(;j corresponds to the jth element in d; 
when the residuals are ordered in increasing order. 

The minimization of Equation 2.8 ignores all residuals d,(jj for j > h, 
which are the largest ones. In this way, the influence of n — h erroneuous data 
in the model is reduced. 

Note that the solution of the LMS problem (minimization of r,(;)) is equiv- 
alent to the solution of minimizing ds(n), because the indices i for the ordering 
Ts({i) corresponds to the indices for the ordering d,(;j. The value of h is given 
and should be larger than n/2. For linear models with p parameters, an opti- 
mal value for h is the integer part of (n + p + 1)/2. Thus, in our case, with 6o 
and (31, it corresponds to h being equal to the integer part of (n + 2 + 1)/2. 
That is, h = |(n + 3)/2]. The breakdown point for such linear regression 
models and with such a value for h is equal to 50%. 

To compute the optimal 8 = (o, £1), the following steps are required: 


1. Consider all pairs of two observations o = (x,y) and o' = (z', y') drawn 
from the original set of data. 

a) Determine por and go so that the line y — Bg + B? x goes 

through observations o and o'. This problem, solved with a system of 
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two unknowns and two equations, results in: por = (y — y')/ (x — ax) 


and 9g? =y- Bo x 


b) Ignore £5? , and define 6% ^ as the value that minimizes the corre- 
sponding error given £4. That is, either 


Bor’ = argming, ERRLM S(fo, m 


Or 
n = argming, ERRLT S (bo, 81^? ). 


'The minimization problem is solved as follows. 

For the LMS problem, (i) define r; = yi — Bo? gj; (ii) order r; and 
obtain r,(; so that rs) < rs(;,1); (iii) determine the length of the 
contiguous intervals containing h points and the midpoint of such 
intervals (that is, l; = |rs(i 41s 1) — rs(j| and mi = (Ts(i+h-1) + 
r5())/2, for 1 <i X n—h- 1): (iv) find the minima of the lengths, 
and then return the median of the corresponding midpoints (that 
is, if I = (i|l; = min, l; }, then return medianjerm;). 

For the LTS problem, (i) define r; = yi — B2? wes (ii) order r; and 
obtain fs) so that ra(jJ < rs(;,1); (iii) for each contiguous inter- 
val containing h points, determine the mean and the sum of the 


squared deviations from the mean (that is, r; — xx iy g/h 
and d; = SE Ge — ri)?); (iv) find the minima of ie sums 


of square deviations, and return the corresponding mean (that is, 
if I = (i|d; = minj dj), then return r;). 
c) Define e” as follows. 
For the LMS problem, 
e” = s(n) Where r; :— |yi — (85^ + 81" zi)]. 
For the LTS problem, 


+ h ~ 3 * y 
e^? := Y , dsj) where di: = yi — (Bo + BL? zi). 
2. Return the pair of parameters f? ' and pr ^ with a minimum error e? 


We have restricted ourselves to the case of one explanatory variable and 
one response variable. In the more general case of r explanatory variables, 
the same approach is applied. To properly determine the r parameters, r 
observations, instead of two observations, should be considered. Then, instead 
of a line, an hyperplane is determined. Nevertheless, in such a general case 
it is not always possible to consider all possible subsets of r observations as 
the number of possible subsets is (7) = n!/(r!(n — r)!). Random subsets are 
considered, but this implies that the method does not always find the optimum 
of the robust regression problem. 


Example 2.38. Let us consider the data sets represented in Table 2.4 and Fig- 
ure 2.6, the linear regression models of these data using the least sum of 
squares (LSS), least median of squares (LMS), and least trimmed squares 
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Fig. 2.8. Data in Figure 2.6 with robust regression (LMS) 


(LTS) are represented in Figures 2.6, 2.8, and 2.9. They correspond to the 
following models: 


e Original data with no perturbation. 
LSS: revenue — —96923.393 4- 49.164 * year 
LMS: revenue = —103671.6 + 52.56 * year 
LTS: revenue = —104159.0 + 52.8 x year 

e Data with perturbation on the year. 
LSS: revenue = —40646.413 + 20.858 * year 
LMS: revenue = —106968.0 + 54.2 * year 
LTS: revenue = —106960.0 + 54.2 x year 

e Data with perturbation on the revenues. 
LSS: revenue = —38453.75 + 19.807 x year 
LMS: revenue = —103671.6 + 52.56 x year 
LTS: revenue = —104159.0 + 52.8 x year 

e Data with perturbation on both year and revenue. 
LSS: revenue = —107400.949 + 54.426 * year 
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Fig. 2.9. Data in Figure 2.6 with robust regression (LTS) 


LMS: revenue = —103681.22 + 52.56 x year 


LTS: revenue = —104163.040 + 52.8 * year 


2.3 Fuzzy Sets 


On another day, when a visitor came and inquired, 


“Is your honorable father in?” 


The son replied, 


“To a certain extent, yes; to a certain extent, no” 


, 2010.0 
1 


Z. Zhuang, p. 284 [464] 


One of the basic properties of standard sets is that elements either belong 
completely to the set or do not belong to the set at all. In this way, given a 


50 2 Basic Notions 








Fig. 2.10. Graphical representation of the membership function for tall: utai 


property, for example, the property of “being odd,” and a number, we can 
check whether or not the number is odd. 

'The main characteristic of fuzzy sets is that membership is no longer a 
boolean property. Instead, membership is graded, and, accordingly, there are 
different degrees of membership. 

Formally, standard sets, known as crisp sets, can be defined in different 
ways. One of them is in terms of characteristic functions. Given a reference 
set X, a characteristic function x is a function that labels each element in X 
as either belonging to the set or not. When we denote membership in a set by 
1, and nonmembership by 0, we have that x is a function from X into {0,1}. 


Example 2.39. Let X be the set of values on a die, and let A denote the odd 
values in X, then xa : (1,2,3,4,5,6) — {0,1} is defined as follows: 


XA(x) =1 if and only if x = 1,3,5. 


As stated, fuzzy sets permit degrees of membership. This is modeled us- 
ing membership functions with range [0, 1] instead of {0,1}. In this way, the 
membership permits a smooth transition from nonmembership (membership 
value equal to zero) to complete membership (membership value equal to one). 
Then, the larger the value, the larger the membership in a set. Accordingly, 
a membership function of a concept A on a reference set X is a function uA 
from X into [0, 1]. 

A typical example of fuzzy set is the set of tall heights (for people). It is 
clear that someone with a height of 1.20 m is not tall, and that someone with a 
height of 2.00 m is very tall. Besides, it is clear that the degree of membership 
in the fuzzy set tall of someone with height equal to 1.75 m should be larger 
than the degree of membership of someone with height equal to 1.70 m. A 
possible definition for this fuzzy set is given below. 


Definition 2.40. Let X be the set of possible heights (X C R); then, we 
define the membership function [ta : X — [0,1] as 


0 if x < 1.70 
Haule) = 4 (x — 1.70)/0.1 if 1.70 < x < 1.80 
1 if x > 1.80. 
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(a) (b) (c) 


Fig. 2.11. Some typical membership functions: (a) Triangular; (b) L-shape; (c) 
I-shape 


Figure 2.10 shows a graphical representation of this membership function. 

Fuzzy sets are used to model a large variety of concepts. Especially, they 
are used for representing the meanings of graded adjectives and adverbs, and 
for computing with them. So, in some sense, they are used for computing 
with words, and thus having a practical application. For example, they can 
be used to model concepts related to distance (e.g., a value x is near a point, 
such as near zero, near Barcelona), temperature (e.g., a room is warm or the 
temperature of a device is high), cost (e.g., the trip from Tokyo to Shinjuku is 
not cheap), time (e.g., the trip from Tokyo to Osaka takes around 2.5 hours, 
the train arrived almost on time), and so on. 

The use of fuzzy sets for describing concepts in real applications makes 
them context dependent. For example, in the case of tall, some countries 
would define the membership function of Htau by 


0 if x < 1.60 
Haule) = 4 (x — 1.60) /0.1 if 1.60 € x < 1.70 
1 if x > 1.70. 


Similarly, the concept of being late for a date would be also context de- 
pendent. 

To represent fuzzy concepts in a simple way, triangular membership func- 
tions are often considered. They are defined in terms of three parameters 
(a, B, y), as follows. 


Definition 2.41. Let X be a reference set (X CR), and leta < B « y in X; 
then, the triangular membership function Be a (8) is defined as follows: 


0 ifzcxaorz2* 
Ha palt) = 4 (x o)/(8—o) fac «8 
(y-2x)/(y- 8) fB2a<y 


Figure 2.11 represents a triangular membership function as well as two 
other functions (L-shape and I’-shape) that are often used in practical appli- 
cations. 
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We have defined the membership of a set to take values in [0, 1]. Alter- 
natively, to express partial membership it would be enough to evaluate the 
elements in an arbitrary (partially) ordered set L (e.g., a finite ordinal scale). 

In the next example we consider the definition of a few quantifiers. Fuzzy 
quantifiers are fuzzy sets u : [0,1] — [0,1] whose domain corresponds to 
the proportion of the elements that satisfy a property. So, if Qs59 represents 
the quantifier “more than 50%,” we have that Qs50(x) corresponds to the 
membership of the proportion x in the concept “more than 50%.” 


Example 2.42. We consider the definition of four quantifiers, giving for each 
of them their membership functions. They correspond to the concepts “there 
exists," “for all," and “more than 50%.” For the last quantifier, a crisp and a 
fuzzy definition are given. 

Figure 2.12 gives a graphical representation of the quantifiers. 


1. Quantifier "there exists": 


0ifr-0 
aquel cou 


Note that when the proportion of elements is not zero, we have that the 
quantifier is completely satisfied. 
. Quantifier “for all”: 


N 


oe {oze 


In this case, we need that all elements x are included in the proportion. 
Therefore, Qy(x) = 1 if and only if z = 1. 
. Quantifier “more than 50%”: 


wo 


Qs50(x) = { 1 if x € [0.5, 1]. 


4. Quantifier “more than 50%” (fuzzy definition): 


0 if x € [0, 1/3) 
Q æl) = 4 3(x — 1/3) if x € [1/3, 2/3] 
1 if z € (2/3,1] 


Note that Q>so is satisfied when x > 1/2. In fact, we have a crisp tran- 
sition for x = 1/2. In contrast, in the case of the fuzzy quantifier Q. c, 
there is a smooth transition from nonmembership to full membership. The 
transition starts with x = 1/3 and finishes with x = 2/3. 
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Fig. 2.12. Fuzzy quantifiers: (a) “for all”; (b) “there exists”; (c) “more than 50%”; 
(d) “more than 50%” (fuzzy definition) 


2.3.1 Operations on Fuzzy Sets 


The basic set theoretic operations defined on crisp sets (intersection, union, 
and complement) have counterparts in fuzzy sets. These functions are the 
fuzzy intersection, fuzzy union and fuzzy complement. For union and the 
intersection we will consider two fuzzy sets A and B represented by the mem- 
bership functions 4 and pg. For the sake of simplicity, it is assumed that 
both functions are defined on the same domain. That is, u4 : X — [0,1] and 
uB : X [0,1]. 

We start considering the definition of the fuzzy intersection, tThat is, the 
function that computes the intersection of two fuzzy sets. T'he output of this 
function is naturally another fuzzy set, and this fuzzy set is represented by 
its membership function. Thus, given two fuzzy sets A and B represented by 
their membership functions 4 and ug, the fuzzy intersection permits us to 
construct a new fuzzy set ANB represented by its membership function Jang. 
More specifically, pang is defined for each x as a function T of 4 and up on 
x. Formally, 

pann(2) = T(ua(z), ua (2)). 


This function T is the t-norm. Note that T takes two values in [0, 1] and 
returns another one in the same interval. 

In order to have an intersection consistent with the one on crisp sets, the 
function T should satisfy T(0,0) = T(0,1) = T(1,0) 20 and T(1,1) — 1. All 
t-norm functions satisfy these properties, as will be seen below. Nevertheless, 
other properties are also required. 

'The formal definition of a t-norm is given below: 


Definition 2.43. A function T : [0,1] x [0, 1] — [0, 1] is a t-norm if and only 
if it satisfies the following properties: 


(i) T(z,y) = T(y, x) (symmetry or commutativity) 

(ii) T(T(a,y), z) = T(x, T(y, z)) (associativity) 

(iii) T(z,y) € T(z',y') if x <a’ and y € y' (monotonicity) 
(iv) T(x,1) = x for all x (neutral element 1) 
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Additionally, t-norms are often required to satisfy continuity and subidem- 
potency (T(#,2) < x for x # 0). Such t-norms are called Archimedean t- 
norms. 


Definition 2.44. A continuous t-norm satisfying subidempotency (T (x, x) < 
x) is an Archimedean t-norm. 


While in crisp sets there is only a single function for intersection, this is 
not the case for fuzzy sets. We consider below some examples. 


Example 2.45. 'The following functions are t-norms. 


Minimum: T(z,y) = min(z, y). The minimum is often denoted by ^. That 
is, z ^y — min(z, y). We will use this notation in this book. 

Algebraic product: T(zr,y) = xy. 

Bounded difference/Lukasiewicz: T(x, y) = max(0, x 4- y — 1). 

Yager family: T,(z,y) = 1 — min (1, ((1 — x)" + (1 — y)")'/*) for w > 0. 


All the t-norms are proper generalizations for conjunctions on crisp sets, 
as, for all of them, T(0,0) = T(0,1) = T(1,0) = 0 and T(1,1) = 1. In fact, 
these equalities follow from Definition 2.43. The study of t-norms has led to 
several characterizations. One of them is as follows. 


Theorem 2.46. T : [0,1] x [0,1] — [0,1] is an Archimedean t-norm if and 
only if there exists a continuous strictly decreasing function f from [0,1] toR 
with f(1) =0 such that 


T(z,y) = f? ((z) + f(y)) (2.9) 
for all x,y € [0,1] 


This function f is called a decreasing generator, and fY : R — [0,1] 
corresponds to its quasi-inverse (or pseudo-inverse). Such a quasi-inverse is 
defined as 


1 if x € (—oo, 0) 
fH) = fc if x € [0, f(0)| 
0 if x € (f(0), oc). 


'The definition of fuzzy union, the function to compute the union of fuzzy 
sets and to model disjunction, follows a pattern similar to the one of fuzzy 
intersection. Here, we will use a function | that is known as t-conorm. The 
same properties considered for the t-norm apply here, except for the neutral 
element. In the case of union, the neutral element is zero. The requirement of 
t-conorms are formalized below. 


Definition 2.47. A function L : [0,1] x [0,1] — [0,1] is a t-conorm if and 
only if it satisfies the following properties: 
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(i) L(z,y) = L(y, x) (symmetry or commutativity) 
(ii) L(L(a, y), z) = L(x, L(y, z)) (associativity) 

(iii) L(x, y) € L(z',y') if x <a’ and y € y' (monotonicity) 
(iv) L(x,0) = x for all x (neutral element 0) 


So, given two fuzzy sets A and B, their union AU B is represented by the 
membership function 4ue(2) = L(ua(x), up(y)). 


Example 2.48. The following functions are t-cornorms. 


Maximum: l(z,y) = max(z, y). The maximum is often denoted by V. That 
is, z V y = max(z, y). We will use this notation in this book. 

Algebraic sum: l(z,y) =x +y — xy. 

Bounded sum/Lukasiewicz: L(x,y) = min(1,x + y). 

Yager: Lulz, y) = min(1, (x® + y”)/”) for w > 0. 

Sugeno: L(x, y) = min(1l,z +y + Axy) for A > —1. 


For the sake of completeness, we give below a result concerning the repre- 
sentation of Archimedean t-conorms. A t-conorm is Archimedean when it is a 
continuous superidempotent t-conorm, where superidempotency means that 
L(x,x) > x for all x. The result is analogous to the one given for t-norms. 


Theorem 2.49. L : [0,1] x [0,1] — [0,1] is an Archimedean t-conorm if and 
only if there exists a continuous strictly increasing function g : [0,1] — R with 
g(0) =0 such that 

L (x,y) = 9 P (g(z) + oly) (2.10) 


for all x,y € [0,1]. 
In this case, g is known as an increasing generator, and its quasi-inverse 


is defined by (here, g^! is the inverse of g) 


0  ifz€(-oo,0) 
g =< g" if x (0, 9(1)] 
1 if a € (g(1), ow). 


To illustrate, we give below a generator for the Sugeno t-conorm. 


Example 2.50. The increasing generator of L(x, y) = min(1, z4- y-- Axy) (for 
A » —1)is 


galz) = In(1 + Ax)/In(1 + A). (2.11) 
The inverse of g is the function 
di aee A (geom — 1), (2.12) 


Now, we define a substraction operator — for a t-conorm L: . 





Definition 2.51. Let L be a t-conorm; then, the operation —, on [0,1]? is 
defined by 
z—,9y:—inf(z|yLz > zr}. 
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For Archimedean t-conorms and for the maximum, this equation can be 
rewritten as follows. 


(i) If L is an Archimedean t-conorm with generator g, then 
x— y = g CP (g(x) - g(y)). 
(ii) If L is equal to the maximum, then 


_jwiferty 
T-max LS) eg ys 


Next, we define in a similar way the operator ZT(z, y) for a t-norm T. 


Definition 2.52. Let T be a t-norm; then, the operation T+(x,y) on [0,1]? 
is defined by 
Tr(a, y) := sup(z € [0, 1]|z Tz € y}. (2.13) 


This operation is known as T-residuum. 

Fuzzy complements for fuzzy sets follow a similar pattern. In this case, 
however, the operation is unary. Therefore, the complement of a fuzzy set A 
is defined in terms of a function neg : [0, 1] — [0, 1], as follows: 


iA (x) = neg(ua(x)) 
'This function, known as negation, is defined as follows. 


Definition 2.53. A function neg : [0,1] — [0,1] is a negation if it satisfies: 


(i) neg(0) = 1 and neg(1) = 0 (boundary conditions) 
(ti) x < y imply neg(x) > neg(y) (order reversing) 
(iii) neg(neg(x)) = x for all x (negation is involutive) 


We now present some examples: 


Example 2.54. The functions listed below satisfy the requirements of comple- 
ment 


Standard: neg(r) = 1 — 7z. 
Yager: neg, (x) = (1 — zY)!” for w > 0. 
Sugeno: neg,(x) = (1 — z)/(1 + Az) for à > —1. 


Now, we give a characterization of negations 


Proposition 2.55. A function neg : [0,1] — [0,1] is a negation if and only 
if neg is of the form 
neg(z) = h^ (1 — A(z)) 


for all x in |0, 1] for a strictly increasing function from [0,1] to [0,1], with 
h(0) = 1 and h(1) = 0. 
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Although the operations over fuzzy sets satisfy the properties we have 
detailed above, it is not true that they satisfy all properties satisfied by union, 
intersection, and complement in classical set theory. For some properties, only 
a few families satisfy them. This is the case with the law of excluded middle, 
law of contradiction, and the law of De Morgan. These laws are formulated 
for fuzzy sets as L(a,neg(a)) = 1, T(a,neg(a)) = 0, and neg(L(a,b)) = 
T (neg(a), neg(b)) (for all a, b € [0,1]). Among the existing operators, we have 
L(z,y) = min(1,z--y), T = maz(0, z-- y — 1), and neg(x) = 1— x satisfy the 
exluded middle and the law of contradiction but not idempotency (T(z, x) = 
x, L(x,x) = x). In contrast, L = max, T = min, and neg(x) = 1 — x, which 
satisfy idempotency, do not satisfy these properties. 

Negation functions permit us to define the dual of any binary operator. 





Definition 2.56. Let B be a binary operator on [0,1] x [0,1]; then, its dual 
operator with respect to a negation function neg is defined, for all x,y € [0,1], 
as follows: 


B(v, y) := neg(B(neg(x), neg(y))). 


An order can be defined on binary operators. This order, denoted by <, 
permits us to classify the operators. The definition is based on a pointwise 
ordering. 


Definition 2.57. Let Bı and B3 be two binary operators on [0,1] x [0,1] > 
[0, 1]; then, we say that Bı < B», if for all x,y € [0,1], we have Bi(z,y) < 
Bo(x,y). 


When, for two operators Bı and Bo, either By < B5 or B2 < Bı, the 
two operators are said to be comparable. Although this definition is given for 
binary operators, it can be applied to N-dimensional operators. 

From this definition, it follows that min < max. This definition can be 
used to characterize the parameters of some families of norms. For example, 
in Yager's family of t-conorms, we have that if we consider two values o and 
8 such that a < p, then Lg < La. 

Additionally, we have the following proposition: 


Proposition 2.58. For all t-norms T and all t-conorms L, we have: 
Tmin 2 T X min 
max 4 L 4 Lmax 
Here, Lmax and Tmin stand for the following operations: 
xify=0 


Limal y) = 4 y ifr —0 (2.14) 
1 otherwise. 
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"Tis min max dax 


Fig. 2.13. t-norms, binary aggregation operators, and t-conorms 


xify=1 
Tmin(z,y) = 4 yife=1 (2.15) 
0 otherwise. 


In the light of the previous definition, Equation 1.1 (which establishes that 
aggregation operators C are functions that yield a value between the minimum 
and the maximum of the input values) can be expressed equivalently for two 
inputs as 


min < C < max. 


Therefore, putting all these results together, we have the following prop- 
erty: 


Tmin ~ T ~ min ~ C ~X max ~ 1 ~ Lax. 





That is, aggregation operators, t-norms, and t-conorms define different 
regions in the space of binary operators. t-norms are conjunctive, t-conorms 
are disjunctive, and aggregation operators are compensatory operators. These 
regions are illustrated in Figure 2.13. 


2.3.2 Implications 


Among other existing operations on fuzzy sets, we underline fuzzy implica- 
tions. There exist several approaches to define them, based on different def- 
initions of classical logic. One of them is based on the equivalence of the 
implication a — b, with ~a V b. Implications of this form are known as L- 
implications, as they are based on a t-conorm (the one used to model the 
disjunction in —a V b). 


Definition 2.59. A binary operator T : [0,1] x [0,1] — [0,1] is a fuzzy L- 
implication if it can be expressed as 


T(a,y) = L(neg(zx), y). (2.16) 


The Kleene-Dienes implication is one of the 1 -implications. Its definition 
is based on L = max and neg(r) = 1 — x. Therefore, the Kleene-Dienes 
implication corresponds to 


T(x,y) = max(l- x,y). (2.17) 
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The Lukasiewicz implication is another example (using bounded sum and 
neg(r) — l- zx): 

T(a,y) = min(l,l-— zx +y). 
R-implications (R for residuated) represents another family of implications. 
They extend to the fuzzy setting the definition in classical logic that an im- 
plication can be defined by 


T(a,y) = max{z € (0, 1]|z ^ z € y]. (2.18) 
Definition 2.60. A binary operator T : [0,1] x [0,1] — [0,1] is a fuzzy R- 
implication if it can be expressed using Equation 2.13 for a t-norm T: 

Tr (x,y) := sup(z € [0, 1]|z Tz € y}. 


Although in the crisp setting Equations 2.16 and 2.18 are equivalent, in 
the fuzzy setting R-implications and l-implications do not define the same 
set. Nevertheless, there are some implications, such as the Lukasiewicz one, 
that are both R- and L-implications. 


2.3.3 Fuzzy Relations 


Fuzzy relations are a generalization of crisp relations. We will consider a few 
definitions regarding finite sets of reference. 


Definition 2.61. Let X1, X2,--- , Xy be reference sets; then, R C X, x X2x 
+++ x Xy is a crisp relation and a fuzzy set on X1 x Xa x «x Xy is a fuzzy 
relation. 


Among the existing operations on fuzzy relations, we are interested in the 
composition of binary relations. Such relations corresponds to Definition 2.61 
with N — 2. Composition is defined below. 


Definition 2.62. Let R be a fuzzy relation on X4 x X», and let S be a fuzzy 
relation on X2 x Xa; then, the L-T composition of R and S is a new relation 
on X1 X Xa, denoted by T :— Ro S and defined by 


T(z1, 23) = Lrex, T (R(zi, 2), S(x, x3)) 
for all xı in X4 and all x3 in X3. 


In the particular case of L = max and T = min, we get the max-min 
composition. This is defined as follows. 


Definition 2.63. The standard composition for fuzzy relations corresponds to 
the maz-min composition. That is, the max-min composition of R and S is 


T= Ro S, with 


T (21,23) = max min(R(21, x), S(z,xa)) 


for all xı in X4 and all x3 in Xa. 
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Let us consider an example of the max-min composition. 


Example 2.64. Let Ry : X1 x X» — [0,1] and Rə : Xo x X3 — [0,1] be two 
fuzzy relations defined as follows: 


Rı := (0 0.2 0.8 1 0.5 0.2 0 0) 


0 0 01 01 
0 0 02 01 
0 01 03 0.1 
Ryu | 0 02 08 02 
2: | 0 03 10 0.1 


0.1 0.4 0.5 0 
0 01 02 0 
0 0 0 0 


Then, if R is the max-min composition of Rı and R2, we have 


R= Rı o Rə = (0.1 0.3 0.8 0.2) 


2.3.4 Truth Degrees 


It is known that, in the crisp setting, there is a tight relationship between 
sets and classical logics so that intersection corresponds to conjunction, union 
to disjunction, and complement to negation. This also occurs in the fuzzy 
setting. Here, fuzzy intersection, fuzzy union, and fuzzy complement can be 
used to model conjunction, disjunction, and negation for fuzzy predicates. In 
particular, when we have truth degrees T(p1) and T(p2) to denote the degree 
of truth of predicates p; and p2, then we can combine them by means of t- 
norms, t-conorms, and negations to denote the degree of truth of composite 
predicates. 


Example 2.65. Let us consider the following two rules: 


1) If Barcelona is near and the ticket is cheap, then we visit La Sagrada 
Familia. 


2) If Tokyo is near and we are not tired, then we visit Miraikan. 


Let a, b, c, and d be the truth degrees for “Barcelona is near,” “the ticket 
to Barcelona is cheap,” “Tokyo is near," and “we are tired." Then, the truth 
degree of the antecedent of the first rule would be T (a, 0) and of the second 
rule would be T (c, neg(d)). 


When the predicates include a vague term, as in the case of “Barcelona is 
near," fuzzy sets can be used to determine the truth degree of the predicate. 
'This is illustrated in the following example. 
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Example 2.66. Let us consider the concept “near” described by the following 
fuzzy set: 

| fo if |x| > 50km 
Hnear(®) = { (50 — x)/50 if |z| < 50km. 


Then, if our actual position is Bellaterra and the distance between this 
town and Barcelona (Placa Catalunya) is 24 km, we have that the truth 
degree of “Barcelona is near” can be computed as follows: 


7(Barcelona is near) = Lnear(distance( Barcelona, actualposition)) = 
= Unear (distance( Barcelona, Bellaterra)) = 
= Lnear (24km) = (50 = 24)/50 = 0.52. 


In general, if we have a predicate “x is A” and a membership ua to rep- 


resent the concept A, then the truth degree of T(x is A), given that x equals 
Xo is defined as 
T(x is A) = nuA(xo). 

Truth degrees have also been defined for quantified predicates. In this case, 
we have expressions of the form “Q A's are B's," where Q is a quantifier, such 
as for all, most, or some. The computation of the truth degree of such an 
expression assumes that there is an interpretation of Q in terms of a fuzzy 
set, and we evaluate the proportion of B in A with such a fuzzy set. This is 
formally established below. To do so, we first need some definitions. 


Definition 2.67. Let uA be a fuzzy set on a finite set of reference X repre- 
senting the concept A; then, the Xicount of A is defined by 


Xcount(A) = 5 IA (a). 


rcx 


Let wa and up be two fuzzy sets om a finite reference set X representing the 
concepts A and B; then, the Xcount(B|A) is defined by 


Xcount(A B) 


Acount(B|A) = Xcount( A) 


Xcount(B|A) is a relative measure of the cardinalities of A and B. It 
corresponds to the proportion of A in B. Note that, when all elements in A 
are in B, we have Xcount(B|A) = 1, and, in contrast, when no element in A 
is in B, Xcount(B|A) = 0. Note that this definition is similar to the one of 
conditional probabilities (see Equation 2.1). 


Definition 2.68. Let A and B be two fuzzy sets and let Q be a fuzzy quanti- 
fier. Then, the truth degree of the statement “Q A's are B's" is defined by 


T(Q A's are B's) 2 Q(Xcount(B|A)). 


Note that the sentence “Q A's are B's" is interpreted as *Xcount(B|A) is Q.” 
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Now, we consider an example of computing the truth degree of one state- 
ment with fuzzy quantifiers. 


Example 2.69. Let us consider the evaluation of the statement 
“More than 50% of the students are supporters of F.C. Barcelona” 


in a given class of 25 students, with 15 supporters of the team, and with the 
fuzzy quantifier *more than 5096" defined as in Example 2.42: 


0 if x € [0, 1/3) 
Q.so(x) = 4 3(z — 1/3) if x € [1/3, 2/3] 
1 if x € (2/3,1] 


Then, with these definitions, the truth degree of the statement given above 
is: 


Xcount( AN B 15 
Qs; (Zcount(B|A)) = Qr = Q550(55) = 0.8 

In this example, we have used Zcount(A) = 25 (students in the class) and 
Sicount(A B) = 15 (students in the class who are supporters). Instead, of 
crisp supporters, we could envision fuzzy supporters. In this case, we would 
consider a membership function giving the support degree for each student, 
and then Xcount( ANB) would be computed as the summation of this function 
for all the students in the class. That is, let X = A = (z1,22,..., £25 } be the 
25 students and let p(x;) the degree of x; supporting the Barcelona team. 
Then, Xcount( AN B) is defined by 7... ex MB(i)- 


2.3.5 Fuzzy Inference Systems 


Fuzzy systems are a particular type of Knowledge-Based System, where knowl- 
edge is represented by rules and concepts are represented by fuzzy sets. They 
are typically used to describe a function with m inputs and one output. In 
general, there are N rules of the form 


Ri: IF zàlis A} and ...and z” is A” THEN y is Bj, (2.19) 


where the x correspond to the input variables of the system, y is the output, 
and A? and B; denote fuzzy terms, represented by their corresponding mem- 
bership functions jee and j1p,. For example, Ai might correspond to near, as 
in Example 2.66. 

From now on, we will restrict the discussion to a single input - single 
output system. We will denote the input variable by x. Under this restriction, 
the rules follow the structure: 
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Ri: IF rds A; THEN yis B;. (2.20) 


Given a set of fuzzy rules {R;};, the system computes the value for variable 
y given a value for variable x, say zo. As the terms A; and B; are fuzzy sets, 
the output of the system for input zo is also a fuzzy set, that is, a function from 
the range of y into [0, 1] (a possibility distribution). The actual computation 
of this output value depends on the interpretation of the fuzzy rules. There 
are two main interpretations: disjunctive and conjunctive rules. 


The case of disjunctive rules 


When a fuzzy system is described by N rules of the form of Equation 2.20, 
the output of the system for x = xo is computed using the following steps: 


1. Compute the truth degree or satisfaction degree for the antecendent of all 
rules R;. Let this value be o;. That is, a; = r(xoisA;). In our case, as there 
is a single condition in the antecedent, a; = uA; (xo). For more complex 
antecedents, we would use the approach described in Section 2.3.4. 

2. Compute the conclusion of rule R;. The most common approach is Mam- 
dani's approach. From an operational point of view, Mamdani's approach 
is as follows: for each rule R;, its output fuzzy set up, is clipped according 
to the degree of satisfaction a;. That is, the output of rule R; is ug; ^o; = 
LB; ^ILA; (x0). This expression means that pz, (x) = up, (x) ^ua; (xo). For- 
mally, this computation is equivalent to considering the input as equivalent 
to the set A’ = (zo), and then defining the output as either U;(A' o Rj) 
or A’ o (Uj R5), with o being a max-min composition and R; being the 
intersection of A; and Bj. The expressions U;(A' o Rj) and A’ o (U; R;) 
are equivalent. T'he description uses the first expression. 

3. Compute the output of the set of rules {R;};. Once we have the output of 
each rule R;, denoted by j/5,, we define the output for the whole system 
as the union of the outputs of each rule R;. Using the maximum of the 
union (the most usual operator), we obtain 


B = v (B; ^ Ai(zo)). (2.21) 


4. Finally, the output fuzzy set B is usually defuzzified. This corresponds to 
integrating the whole fuzzy set B into a single number. There are different 
approaches. One of them is the center of gravity. It corresponds to defining 
the output as follows: 

J xig (ade 
YO. — p SA 
J ng (x)dz 


Thus, the output of the system is y = yo. 
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The case of conjunctive rules 


When rules are interpreted in a conjunctive way, the output of a system can 
be computed as either n; (A’ o Rj) or A’ o (nj R5). While the two expressions 
for disjunctive rules are equivalent, this is not true for conjunctive ones. That 
is, in general, n; (A' o Rj) # A’ o (nj R5). However, the equality holds when 
A’ is a single value. 

We will now describe the computation of the output when A' is a single 
value. For convenience, the description follows the computation of n; (A'o R;). 
The output of the system {R;} when x = zo is computed as follows: 


1. Compute A' o Rj for all rules R;. First, the relation R; is defined in terms 
of the implication function as follows: R; := Z(A;,Bj;). Then, as A’ is 
the singleton A’ = {xo}, we have that it can be proved that B; = A’ o 
Ri = A! o Z(Ai, Bj) is equivalent to B;(y) = Z(A;(xo), B;(y)). The latter 
expression is the outcome of rule R;. 

2. Compute the intersection of all the outcomes of the previous step. This is 
expressed as follows: 


B(yo) = A: (Z(Ai(20), Bi(yo))). (2.22) 


2.4 Bibliographical Notes 


1. Measurement theory: The roots of measurement theory are old and 
diverse (see [238], Chapter 20, on Scale Types, for details). Recent work is 
strongly based on S. S. Stevens’s research. In particular, he defined scale 
type according to the class of permissible transformations (see [375, 376, 
377]). He introduced the terms ordinal, interval, log-interval, ratio, and 
absolute scales. Section 2.1 is based on [334] and [217]. The definition of 
measurement given in the first paragraph of this section is based on [417] 
and [376]. [334] and [217] underline the fact that one can always operate 
on numbers, but that the outcome might be meaningless. The definition of 
measurement as the construction of homomorphisms is taken from [217]. 
For a more complete account on Measurement Theory, see [386] and [238]. 
They complete [217]. 

2. Probability theory: There exists a large number of books on probability 
theory. For the first sections of Section 2.2 (random variables, expectation, 
and moments), the references [202] and [42] are adequate. The book by 
R. B. Ash [26] also deals with these topics and includes the results on the 
independence of variables given here in Section 2.2.3. Ash’s book is more 
mathematically oriented. 

Kolmogorov’s axioms were originally published in [215]. Nonparamet- 
rical methods are not discussed in detail in this book, although they are 
mentioned in Section 2.2.4. Such methods are described in [188] and [225]. 

The normal distribution was discovered by De Moivre in 1733, but later 
independently rediscovered by Gauss (1809) and Laplace (1812) [314]. 
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3. Robust statistics and outliers: Robust statistics is described in [194, 
333, 181]. [194] was a seminal work in the area. [181] (Chapter 8) includes 
some discussion on the interest and usefulness of robust statistics. 

Order-statistics are included in some standard books on statistics, such 
as [381] (Chapter 14). For more details, the reader can use the book by 
Arnold, Balakrishan, and Nagaraja [21]. It is a course on order statistics. 

Barnet and Lewis [34] is a book devoted to the study of outliers in 
statistical data. The book also includes descriptions of L-estimators, M- 
estimators, and R-estimators. The book by Hawkins [130] can also be used 
for outliers. 

4. Regression and robust regression: Regression is described in most 
books on statistics. It is worth mentioning [342, 352]. [342] includes a 
chapter on robust regression. For robust regression see also [339]. This 
book focuses on this topic and describes robust regression methods in 
detail, including some implementation issues. Nevertheless, [339] should be 
complemented with [338], which describe some improvements to LMS and 
LTS methods. The examples for LMS and LTS included in Section 2.2.8 
have been solved following this work. 

For regression, [353], which reviews matrix algebra, is also useful. It 
includes chapters devoted to inverse matrices and generalized inverses, 
as well as such simpler operations as rank determination and methods 
for solving linear equations. Additionally, it includes a chapter on regres- 
sion. Computational issues are also considered. [353] (Chapter 8.6), [352] 
(Chapter 1.5), and [354] (p. 469) describe algorithms for computing the 
generalized inverse. The computation of the rank of a matrix is given in 
[353] (Chapter 7.2). 

In relation to regression, the first description of the least sum of squares 
found in the literature was due to Legendre [224], but Gauss claimed later 
its discovery. The interesting papers by Plackett [323] and Stigler [378] 
discuss this matter. Harter, in his series of articles [123, 124, 125, 126, 
127, 128, 129], traces the history of this field (mainly from Galileo Galilei, 
1632 to 1974). 

5. Fuzzy sets: Fuzzy sets were originally defined by L. A. Zadeh in 
1965 [459]. [211] is a standard textbook in this field, and the Handbook of 
Fuzzy Systems [341] gives an account of the main topics. L-fuzzy sets are 
an alternative to membership functions in [0, 1]. L-fuzzy sets were intro- 
duced by Goguen [165]. For fuzzy sets and models related to uncertainty, 
see [219]. The concept “computing with words" and the use of fuzzy sets in 
this framework is detailed in [462]. See also [463] for some related research 
in this area. 

The origin of t-norms can be found in probabilistic metric spaces [260, 
350, 349]. Original characterizations of t-norms and negations can be 
found, respectively, in [227] and [418]. Initial results in the field of fuzzy 
logic establishing t-norms, t-conorms, and negations can be found in [18]. 
See also the book by Klement, Mesiar and Pap [210] on t-norms. 
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The book by Alsina, Frank, and Sklar [17] devoted to associative op- 
erators include results concerning t-norms and t-conorms. Fuzzy systems, 
specially for applications in fuzzy control, are explained in several spe- 
cific books. Fuzzy control is described in [98]. The distinction between 
disjunctive and conjunctive rules can be found in [211]. 

Fuzzy quantifiers, that were studied by Zadeh (see e.g., [461]) are re- 
viewed by Liu and Kerre in [232] and [233]. 


3 


Introduction to Functional Equations 


1r PROBLEME. Déterminer la fonction (£), 
de manière qu'elle reste continue entre deux 
limites réelles quelconques de la variable x, 

et que lon ait pour toutes les valeurs réelles 
des variables x et y 


(D) plz + y) = O(a) + ó(y)' 
Augustin-Louis Cauchy, [66] (p. 104) 


Functional equations are equations where the unknowns are functions. A well- 
known example of functional equation is the following Cauchy equation: 


lx + y) = plx) + gly). (3.1) 


A function ¢ is a solution of this equation if, for any two values x and y, the 
application of ¢ to x + y equals the addition of the application of ¢ to x and 
to y. Therefore, the equation establishes conditions that functions ¢ have to 
satisfy. Typical solutions of this Cauchy equation are the functions ó(x) = ax 
for an arbitrary value for a. 

In information fusion, functional equations can be used in two different 
contexts. 


1. Functional equations can be used when we need to define an aggregation 
operator and we know which basic properties it has to satisfy. We can 
express the conditions of such an operator using functional equations. 
The operator is then derived from the equations. 

2. Functional equations can be used to study the properties of information 
fusion methods. This is so because they can characterize the operators. 
Here, a characterization consists of finding a minimum set of properties 


! Determine the function ¢(x) so that it remains continuous between two arbitrary 
real limits of the variable x, and that for all real values of the variables z and y 


one has (I) (x + y) = (x) + o(y) 
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(a minimum set of equations) that uniquely implies the operator. It is 
important to say that the set of properties that imply an operator is 
usually not unique. 


An example about the use of functional equations in the definition of 
numerical aggregation operators is given below. The theorem establishes that 
the most general solution of two functional equations (Equations 3.2 and 3.3) 
is the weighted mean with nonrestricted weights. 


Theorem 3.1. The most general function of two variables satisfying the func- 
tional equations 


plz t+t,y+t) = G(x,y) +t (3.2) 
and 
ó(zu, yu) = o(a,y)u foru #0 (3.3) 
for all z,y,t, and u is 
olz, y) = (1— k)z + ky. (3.4) 
Proof. Let y' = (y — x), x’ = 0, and t = x; then, Equation 3.2 corresponds to 
plz, y) = (0+ z, (y — £) +x) = 9(0, (y — z)) + z. (3.5) 


Now, for x Z y, we can use Equation 3.3 to rewrite $(0, y — x) as follows: 
9(0, y — x) = é(0(y — x), 1(y — x)) = (y — x)o(0, 1) for (y — x) # 0. 
'This equation means that 
$(0, y — x) = k(y — x) with k = $(0, 1). (3.6) 
Thus, for y Z x, Equations 3.5 and 3.6 lead to 
$(z,y) = k(y — x) + x = (1— k)z + ky. 


Now, let us consider the case of y = x; taking y = x = 0, we have 
that Equation 3.3 implies that $(0,0) = 0. Thus, Equation 3.2 implies that 
o(a,x) = $(0,0) +a = zx. As, in both cases, Equation 3.4 holds, this equation 
is implied by Equations 3.2 and 3.3. 

Equation 3.4 satisfies both Equation 3.2 and Equation 3.3. Therefore, the 
theorem is proved. 














Alternatively, the theorem can be seen as a characterization of Equa- 
tion 3.4. Nevertheless, other characterizations are possible. For example, the 
following proposition gives another characterization of Equation 3.4. 
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Theorem 3.2. The most general function of two variables satisfying the func- 
tional equations 


ó(z1 + yi, X2 + yo) = ó(21, 22) + O(Y1, v2) 3.7) 


and 
b(x,x) =x 3.8) 


for all z1,z2,y1, ya and x is 





d(x, y) ^ (1— k)z + ky. 3.9) 


3.1 Basic Functional Equations 


In this section, some examples of basic functional equations are given with 
their solutions. We start with the Cauchy equation described above, and then 
present some of Cauchy's other equations: the exponential, the logarithm, 
and (ry) = o(x)¢(y). We will also present the Jensen equations. For some 
of them, generalizations are given. Note that, from now on, we will denote 
olx + y) = (x) + o(y) by the first Cauchy equation, as it is extensively used 
in the rest of this chapter. 


Proposition 3.3. If a continuous function $ : IR — R satisfies the Cauchy 
equation 
plz + y) = O(a) + o(y), 
then there exists a real constant a such that 
p(z) = ox 
for all real x. 


The theorem holds even if the function $ is continuous at a point or mono- 
tone or bounded on one side on an interval of positive length. In the following, 
we assume that such conditions hold when we come up with the Cauchy equa- 
tion. 

Now, we consider a generalization of the previous equation. This general- 
ization is for functions ¢ on RN. 


Proposition 3.4. If a real function ¢: RN — R satisfies 
lxi +y, 22. ya, ...,ZN UN) = $(21, 22, ... x N)- (yis yos. UN), (3.10) 
then it is of the form 
$(z1,29,..., XN) = 0121 + Q2£2 +++: +ANTN 


for an arbitrary real constant ai. 
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Proof. To solve this equation, we will reduce it to the first Cauchy equation. 
To do so, we start by generalizing the equation, by induction, to 


O(X1 + X2 +++: Xp) = 66) + ó(x2) +--+ + (xp) 
for all X1,X2,...,Xp € RN and p — 2,3,... 
Therefore, as any vector x can be rewritten as 
x = (21,29,..., EN) = (21,0,...,0) + (0, £2,...,0) +---  (06,0,..., £N), 
we can express ġ(x) as 
é(x) = ó((z1,22,..., 2N)) = 


= $((21,0,...,0)) + 9((0,22,...,0)) +--- + 9((0,0,..., o )), 


which, defining 
vixi) = $((0,...,0,2;,0,...,0)), (3.11) 


where x; occupies the ith position in the vector, can be rewritten as 


N 
(x) = Va(z1) + V»(z2) +--+ + Un (tn) = 2; vVi(zi) (3.12) 


for all x; € R and = 1,2,...,N 
Equation 3.10 (and 3.11) implies that p; satisfies the first Cauchy equation 
for all i = 1,2,...,n and for all x, y, and z + y in R. That is, 


Wil +y) = vi(z) + vi(y). (3.13) 
Therefore, applying Proposition 3.3, we have that Y; is of the form 


Wi(x) = ox. 


Now, replacing v; with its equivalent expression in Equation 3.12, we have 


N 
P(x) = ó(z1,22,...,2N) = + Oc. 
i=1 











As this expression satisfies Equation 3.10, the proposition is proved. 





Proposition 3.5. If a nonidentically zero function o : R — R satisfies the 
equation 


olz +y) = Ox) oy), (3.14) 


then there exists an arbitrary real constant a such that 


d(x) =e, 
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Equation 3.14 is known as the exponential equation. 
A similar proposition applies for the logarithm equation. That is, when 


olz - y) = O(a) + oy) (3.15) 
holds for all positive x and y, then ¢ is of the form 


p(z) = alog(z). 


The most general solution of ó(x - y) = ¢(x) + ¢(y) when the equation is 
valid not only for positive but for all real x # 0 and y Æ 0 is ọ(x) = alog|x|. 
Finally, the solution of 


o(xy) = o(x)e(y), (3.16) 


when it holds for all positive x, is $(r) = z^ or ¢(x) = 0. 

'The proof of the equations is obtained through the transformation of Equa- 
tions 3.14 and 3.15 into the first Cauchy equation. For example, the substitu- 
tions x = e", y = e", and ó(e") = wv(w) into the logarithm equation (Equa- 
tion 3.15) yield the equation y(u + v) = v(u) + v(v). Therefore, y(u) = au, 
and, thus, ¢(e“) = au or ó(x) = alog(z). Similarly, (7) = e"? also leads 
to the first Cauchy equation for Equation 3.14. Equation ¢(ay) = ¢(x)¢(y) is 
also solved by rewriting it as the first Cauchy equation. 


Proposition 3.6. Let N be any fixed N > 2, let x; be in J (an open real 
interval) for alli — 1,..., N, and let à be a function continuous at a point or 
monotone or bounded on one side on an interval of positive length. Then, the 
general solution of the N-term Jensen equation 


N 
oy on) =F Loe) (3.17) 
d(x) = ax + f, (3.18) 


with a and B arbitrary real constants. 


Note that, in the case of N — 2, the above equation reduces to 


ED NEGEEO Si 

The 2-term Jensen equation is known as the Jensen equation. It comes from 
the definition of convex functions. A function which satisfies ọ((x + y)/2) < 
(d(x) + à(y))/2 in a certain interval is said to be convex in that interval. 
Geometrically, when this inequality holds, any chord lies above or on the 
curve (see Figure 3.1). The proposition shows that, for continuous functions, 
the equality can only be satisfied for lines of the form: d(x) = ax + 8. A 
generalization for the N-term is given below. 
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Fig. 3.1. Convex functions and Jensen equation 


Proposition 3.7. Let N be any fixed N > 2, let x; € J (an open interval), 
and let 9 and w be strictly monotone functions. Then, the general solution of: 


OUD oa) =e (ZY ved) (3.20) 


p(z) = a(x) + 8, 


with a and B arbitrary real constants such that a 4 0. 
A generalization of the Jensen equation (Equation 3.19) is given here. 
Proposition 3.8. The general solution for 6 : R? — R of 


= (21,22) + O(y1, Y2) (3.21) 


zı +yYı L2+ ye 
, 2 


$(— 5 


olz, y) = oz + Byte. 


Now, we present a completely different type of functional equation that 
will be useful in Section 3.3. In fact, the problem corresponds to a system of 
equations to define the area of a rectangle. Two equations are established that 
define the area on the basis of the two sides of the rectangle: $(side,, side2). 
'The meaning of the equations is represented in Figure 3.2. 


Proposition 3.9. The most general positive solution of the system of equa- 
tions 
(xı + £2, y) = ó(a1, y) t ó(x2, y) (3.22) 


olz, yı + ya) = é(z,y1) + (z, ya) (3.23) 
1s 
P(x, y) = kay. 
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Fig. 3.2. The area of a rectangle: graphical representation of Equation 3.22 (left) 
and Equation 3.23 (right) 


Proof. Let us start by considering Equation 3.22, and by assuming that ¢ 
is constant on y. Then, defining y(x) = (z,y), the equation reduces to 
(xı + 23) = v(z1) + v(za). As this is the first Cauchy equation, w(x) is of 
the form w(x) = ax. Nevertheless, although we can conclude that (a, y) is 
also of the form ax, a depends on y; i.e., the constant a we have obtained was 
for a particular y, and, thus, a is a function of y. Therefore, ó(r, y) = aly) x. 

Similarly, considering 3.23, we conclude that (x,y) is the product of y 
and a function of x. Formally, ó(x, y) = f(x) - y. 

Thus, the following equation holds: 


(x, y) = a(y) : = B(x) - y. (3.24) 
Now, dividing by the product x- y, we obtain 


$m) _ alu) _ le) 


cy y x 

The only way that a(y)/y is equal to G(x)/x for all x and y is with both 
quotients always equal to a constant. Denoting such a constant by k, we have 
that the following holds: 


cy y x 
Therefore, ¢(a,y) = kay. Finally, as this expression satisfies Equa- 
tions 3.22 and 3.23, the proposition is proved. 




















The value of the constant k depends on the scale we consider for the area 
of the rectangle. If we add the requirement that the area of a rectangle with 
x =1 and y = 1 is equal to 1, then k should also be 1. 

In Section 2.3.1, some other examples of functional equations are given. 
They establish the t-norms (Definition 2.43), t-conorms (Definition 2.47), and 
complements (Definition 2.53). 
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3.2 Using Functional Equations for Information Fusion 


In this section, we present an example of the application of functional equa- 
tions to construct an operator that aggregates numerical information. We start 
by formalizing the problem, and then turn to the development of its solution. 
The example illustrates the main procedures to prove theorems by means of 
functional equations. 


Example 3.10. In a committee, m different projects have been evaluated by 
N different experts to allocate a total budget of s euros. Then, a decision 
maker has to aggregate the information of all the experts and give a final 
allocation of the budget. The problem is how to formalize the aggregation 
process. We start by formalizing the available data and then the requirements 
for the aggregation process. 

Available data is modeled as follows: zi stands for the quantity that the 
ith expert assigns to the jth project. X stands for the whole matrix with the 
opinions of all the experts (X = {a}) and x^ corresponds to the vector with 
the values of the ith expert for all the projects. Similarly, x; is the vector with 
all the assignments to the jth project. Table 3.1 illustrates the data supplied 
by the experts. In this table, the ith expert is denoted by E;. 

The requirements for the aggregation process will assume 


R1: The amount allocated to projects is always positive; i.e., aj > 0 for all i 
and j. 
R2: Each of the experts distributes the whole amount s among all the projects; 


. m i Ft, . 
ie. 5754 T} = s for all i. 


Once the experts have supplied their evaluation, the decision maker (de- 
noted by DM in the last row of Table 3.1) has to make a final assignment. 
This final assignment is expressed as a function g of X. That is, g(X) denotes 
all the assignments and g;(X) corresponds to the final assignment to a partic- 
ular project j. Therefore, g(X) = (g1(X),..., gm (X)). Now, we consider our 
basic assumptions about functions gj for j = {1,--- , mk: 


(i) The total amount distributed by g should be s. This condition is only 
required when all the experts assign the whole quantity s, i.e., when re- 
quirement R2 above is fulfilled. 

(ii) The final quantity that the decision maker assigns to a project only de- 
pends on the assignments to that project. This is, instead of considering 
functions g; on the whole matrix X, it is enough to consider functions 
defined on a single column. Denoting such function by f; and the column 
by xj, we have g;(X) = f;(x,;). These functions appear in the last row 
of Table 3.1. The definition of f; in this way satisfies the condition of 
independence of irrelevant alternatives. 

(iii) If all the experts assign 0 to a certain project, then the decision maker 
will also assign 0 to the project. This is, /;(0) = 0. 


3.2 Using Functional Equations for Information Fusion 75 





Table 3.1. Assignment of s euros to m projects by N human experts. (E1,:-- , En} 
stand for experts, {Proji,--- , Projm} for projects, x; = (xj ee 2) for assignments 
to the jth project, and f; (xj) for the final decision for the jth project 


All together, we have that the function g(.X) has to satisfy the following 
conditions: 
1. g(X) = (g1 (X)g92(X) ... gm (X)) = (fai) faQ2) - - - fm(Xm)), where f; : 
(0, s]|V — Rt for j — 1,--- ,m. 
2. X; Xj = s implies that 5 7^, fj(x;) = s. 
3. f;(0) 20 for j =1,--- ,m. 


Functions f; that satisfy these conditions are characterized in the next 
proposition. 


Proposition 3.11. The general solution of the system 


f; :[0,5]" ^ RT for j = (1, , m) (3.26) 
ba — s implies that b f(x) =s (3.27) 
j=l j=l 

f;(0) =0 for j =1,---,m (3.28) 


for a given m » 2 is given by 








N 
fix) = fa(x) = +++ = falx) = f((01,22,---,2n)) = Y Loin 
i=1 
where ay,--: ,Qn are nonnegative constants satisfying x a; — 1, but are 


otherwise arbitrary. 


Proof. We start by considering Equation 3.27. As this equation holds for all 
Xj, then, in particular, it should hold for the substitutions x; = s, X2 =- = 
Xm = 0. Then, as f;(0) = 0, according to Equation 3.28, we get 
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fils) ^ s. (3.29) 


Nevertheless, the selection of x; was arbitrary. Therefore, the equality 
holds for all j. That is, 


f;(s) = for all j € (,,2,...,m]. 


Let us now consider a different substitution in the same equation: x, = Z, 
X3 = S — Z, X2 X4 e Xe 0. With this substitution, and taking 
advantage of Equation 3.28, we get 


filz) + fs(s - z) =s. 


Therefore, the following holds: 








filz) =s— fs(s — z) for all z € [0, s]N. (3.30) 


Let us consider again another substitution in the same Equation 3.27. In 
this case, X1 = X, X2 = y, X3 —s8— Xx — y = 0. Then, we get 


fi(x) + fa(y) + fss-x—y)^s. 
which is equivalent to 
fi(x) + fa(y) =s— fs(s - x — y). (3.31) 


Note that the terms in the right hand side of this equation can be made equal 
to the ones in Equation 3.30 with z = x+y. So, s— fs(s—x— y) = fi(x4 y). 
Taking this into account, we obtain the following equation: 











filx) + f(y) = fi(x +y) for all x, y, x +y € [0,5]. (3.32) 


As this equation holds for all x, it also holds for x = 0. So, as f1(0) = 0 (by 
Equation 3.28 for j — 1), the following can be established: 


fa(y) = fily). 


As the selection of functions fı and f2 was arbitrary, the equality can be 
established for all fj. We will denote this function by f: 





h—f2——fn-f 
Using f, Equation 3.32 is rewritten 





f(x) + fly) = f(x +y) for all x, y, x + y € [0, 5]. (3.33) 


'This equation was solved in Proposition 3.4; therefore, f is as follows: 
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N 
f(x) = 5 Qili. 
i=1 


Nevertheless, as Equation 3.26 implies that f;(x) > 0 for alli = 1,2,... 
and, thus, oz; > 0 for all z € [0, s], we conclude that a; > 0. 
Moreover, taking into account Equation 3.29, which for this particular 
à : N ; 
form for f is equivalent to $7; , ais = s, we further constrain the values for 
Qi, requiring 


m 


, , 








N 
Yi 
i=1 
Finally, as the functions 
N 
fi) = f(x) = = falx) = f((01,22,-..,2)) = raves, 

i=1 

with a1,Q@2,...,ay such that pad a; = 1l, 











satisfy Equations 3.26, 3.27, and 3.28, the proposition is proved. 





Corollary 3.12. For a system satisfying 


fj: [0,8] ^ R* for j = {1,---,m} (3.34) 
Sox; — s implies that 5 fi) =s (3.35) 
j=1 j=1 

f;(0) =0 for j =1,---,m (3.36) 


for a given m > 2, there exists a probability P such that f is represented as 
an expectation: 





filx) = fa(x) — = fmx) = Ep(x) 





3.3 Solving Functional Equations 


Now, we review the main techniques that are commonly in use to solve func- 
tional equations. They will be illustrated with some examples and refer to the 
propositions and proofs given in this section. 


Variables by values: When a functional equation is satisfied for all values 
d in a domain D, it must also be satisfied by a particular value do. The 
substitution of any variable with a particular value might simplify the 
equation. 
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The proof of Proposition 3.11 gives several examples of this technique. 
One of them is the substitution of xı with s, and of x2,...,xXm with O in 
Equation 3.27, which yields Equation 3.29. 

Sometimes, different substitutions are applied to the same equation, 
obtaining different equations. The proof of Proposition 3.11 also illustrates 
this case. Equation 3.27 was substituted three times: (i) x1 = s, X2 =... 
= Xm = 0; (ii) x1 = Z, X3 = S — Z, X2 = X4 = +++ = Xm = 0; (iii) xı = x, 
X2 = y, X3 = S— x — y = 0. These substitutions led to Equations 3.29, 
3.30, and 3.31. 








Function transformation: This is to replace a function by another one, so 


that the functional equation is transformed into an easier one. 
For illustration, let us consider the following equation: 


N 


N 
oY oan) = 1/67 Ya). (3.37) 


i=l 


This equation can be transformed, by considering the function b(a) = 
$(1/a), into 
LS 1 
Tai 1 _ op 
Q N 2. ó(ai)) — ó N >, (ai) 
As this corresponds to the generalization of the N-term Jensen equation 
(Proposition 3.7), expressions for ¢ and ¢ are obtained. 


Variable transformation: In this case, a transformation is applied to a 


variable to simplify an equation. We illustrate this case with the logarithm 
equation (Equation 3.15): ¢(a-y) = (x) + d(y). We have seen that with 
the transformations x = e" and y = e", we get ó(e" - e") = $(e*) + ó(e"). 
Then, using the function transformation $(e") = y(u), we get the first 
Cauchy equation, which is solved, giving y(u) = au, and, thus, ¢(a) = 
alog x. 


Considering a more general equation: The solution of an equation A is 


obtained by considering a more general one, B. This means that a particu- 
lar parameterization of B leads to the solution of equation A. Accordingly, 
the solutions of B are solutions of A when the same parameterizations 
are considered. For example, Proposition 3.6 can be solved with Proposi- 
tion 3.7, as Equation 3.17 corresponds to Equation 3.20 when w(x) = x. 
Therefore, (x) = ay(x) + B with y(x) = x (i.e., d(x) = ax + B) is the 
solution of Proposition 3.6. 


Variables as constants: First, the equation is solved taking as a constant 


one of the variables. Then, in the solution, constants are replaced by func- 
tions of the original variables. The proof of Proposition 3.9 includes such 
a transformation. Note that Equation 3.22 is solved taking y as a con- 
stant. This permits us to rewrite (x,y) so that it does not depend on y: 
w(x) = (x,y). Such rewriting permits us to reduce Equation 3.22 to the 
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first Cauchy equation, and, thus, v(x) = ox. However, as the constant a 
was a function of the selected constant y, the solution of Equation 3.22 
corresponds to $(z.y) = o(y) z. 

Separation of variables: When some variables only appear in one side of an 
equation, both sides can be rewritten as a function of common variables. 
Proposition 3.9 also illustrates this case. In the proof of this proposition, 
Equation 3.24 considers the following equality: 


(x, y) = a(y) -£ = B(x) - y. 
As this equation can be rewritten in a way that both sides do not share a 


common variable 
o(a,y) _ aly) BG) 


? 


aly) _ B(x) 


. B (x,y) 
it means that E uim 


is equal to a constant k. Thus, EE REX k. 





When solving functional equations, these techniques are not used in isola- 
tion but combined together. Proposition 3.11 and, to a small extent, Propo- 
sition 3.9 illustrate this situation. 


3.4 Bibliographical Notes 


1. Functional Equations: Sections 3.1 and 3.2 are based on Aczél's works 
(in particular, on his books [4] and [6]). Aczel [4] gives extensive refer- 
ences and historical remarks about the development of the field. [4] is an 
extended and up-to-date (1966) translation of [3]. Formalization of func- 
tional equations is given in [4]. Most results (with corresponding proofs) 
in this chapter are given in both [4] and [6]. [65] is à more recent book 
on functional equations that also includes most of the examples in this 
chapter. In particular, the example on functional equations for aggrega- 
tion (Proposition 3.11) in Section 3.2 is taken from [6] (p. 2; see also [65], 
p. 157), and the area of the rectangle (from Legendre [223], pp. 293-294) is 
given in both [4] and [65]. Cauchy's equations were formulated and solved 
in some restrictive conditions in [66] (pp. 104-113). The Jensen equation 
is formulated and solved in [204] (p. 176) (see [203]). 

Section 3.3, on the most common methods for solving functional equa- 
tions, is based on [63] (or [64]). 

In Chapter 4, we review some other results about the use of functional 
equations for defining aggregation operators. More references on func- 
tional equations are given in that chapter. 
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Synthesis of Judgements 


Mesclar ous amb cargols! 


Catalan saying 


In this chapter we study some aggregation operators for numerical informa- 
tion. The description is focused on results based on functional equations. 
Therefore, not only are the operators given, but also, at least for some of 
them, their characterization. We refer to these results as syntheses of judge- 
ments. Although the term could be used for any aggregation operator, we 
restrict its use to the case of characterizations using functional equations. 

To describe the main results, we will assume that there is a set of informa- 
tion sources, denoted by X = (z1,..., xw], and that each source x; supplies 
a numerical value a;. To simplify definitions, we assume that a; belongs either 
to the unit interval J = [0, 1] or to the positive real line Rt = [0, 00). For some 
aggregation operators, we will exclude zero. Note that in some of the results 
described, other domains are also appropriate (e.g., the whole real line R). 

The value a; supplied by the information source x; can be expressed by 
means of a function f that assigns the value a; to xi. That is, f(x;) = a; for 
all i € (1,..., N}. 

Using this notation, an aggregation operator is a function C(a1,...,ay), 
or, equivalently, C(f (z1),..., f(zw)), that takes N numerical values and re- 
turns another numerical value. Here, as in Section 1.1 we use C for Consensus. 


4.1 Associativity 


From a technical point of view, associativity is one of the most important 
properties when defining operators. This is so because it permits the definition 


! Mix eggs with snails 
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of an N-dimensional operator from a two-dimensional one. For example, when 
C is associative, the expression can be computed either using 


C(C(C(z1, x2), x3), Pe ., tn) 


or 


C(a1, C(x, m , C(zN 1, £v ))). 


This section is devoted to the study of associativity and the review of some 
results. 

Before going into details on aggregation operators, we review a character- 
ization for associative operators. 


Theorem 4.1. Let I be the unit interval; then, o : I x I — I is a continuous 
operation in I that satisfies 


(aob)oc=ao (boc) (4.1) 


and is cancellative (i.e., a1 ob = a3 ob or boa, = bo az implies a4 = ag for 
any b € I) if, and only if, there exists a continuous and monotone function 
$:J — I (J has to be open at least from one side) such that 


aob— (6 (a) + à7 (0). (4.2) 


Proof. It is obvious that o : I x I — I is cancellative if and only if o is either 
strictly monotone increasing or strictly monotone decreasing. 

We will prove below the case of o being strictly monotone increasing. The 
case of strictly monotone decreasing is similar to the increasing one, and it 
will not be considered here. 

Let 0 « x « 1 be a positive number; then, since o is strictly monotone 
increasing, we have 

wow LKD. (4.3) 


Let 0 < c < 1 be fixed. Then, it follows from associativity that coco: --oc. 
Using this property, we define the function f : N — (0,1) by 


f(n) 2 coco---oc. 


TL 


where N is the set of positive integers. It is obvious from the definition of f 
that f(n +m) = f(n) o f(m) for m,n € N, and that f is strictly monotone 
decreasing from the fact that coc « c. 
Next, it follows from the continuity and monotonicity of o that there exist 
0 < a2 < 1 such that a2 o a2 = c. In the same way, there exists 0 < an < 1 
such that 
C= An 0 04 0*0 da. 
——P— 


TL 


Let us define the function f : Qt — (0,1) by 
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f( 


m 
—) = an 0an O+++ Oan, 
n —— — 


m 


where Q* is the set of rational numbers. 

It follows from the strict monotonicity that f is well defined. In fact, 
we have c = a20°a2 = a4 © a4 © a4 © a4. If ag < a4 o a4, then we have 
a2 0 a2 0400409 «0404004004. That is a contradiction. 

It follows from the definition of f that we have f(p+q) = f(p) o f(q) for 
p,q € Q* and that f is strictly monotone decreasing. 

Now, let us consider irrational numbers x. In this case, there exists a 
sequence £n that tends to x. Let us define the function f : [0,00) — (0, 1] 
by f(x) = limpso f(r,) and f(0) = 1. f is well defined from the strict 
monotonicity and continuity of o, and we have f(x +y) = f(x) o f(y) for 
x,y > 0, and that f is strictly monotone decreasing. Then, if we define a 
function $ : (0,1] — [0,00) by ¢ = f^ !, we have à !(ó(x) + é(y)) = xo y for 
x,y € (0,1]. 

Finally, we can add 0 to the domain of ¢ with ¢(0) = oo. 


As we have just seen, the inequality given in Equation 4.3 and obtained 
from the cancellative condition plays an important role in the proof. Equa- 
tion 4.3 corresponds to the condition for subidempotency of Archimedean 
t-norms. 

We can prove the strictly monotone decreasing case of o in a similar way. In 
such a proof, the inequality xox > x plays the central role. This corresponds 
to the superidempotency of Archimedean t-conorms. A proof similar to the 
one of Theorem 4.1 can be given for Theorems 2.46 and 2.49. 

The function ¢ in Equation 4.2 is unique up to a linear transformation of 
the variable. That is, ó(r) might be replaced by ¢(ax) for a 4 0, but no other 
function is possible. Note that although o was not supposed to be symmetric, 
this follows from Theorem 4.1. Therefore, associativity implies symmetry (i.e., 
voy=yon). 

Associative operators have been extensively studied in the literature. Some 
examples are presented below. The t-norms and t-conorms given in Defini- 
tions 2.43 and 2.47 are other examples of such associative operators. 


Example 4.2. Let y > 0, let 


y 


o(a) 21— GI 


and, therefore, let 
$7 (a) = log(— + 1-1). 


Under these conditions, the Hamacher family of associative operators is de- 
fined as follows: 
a 4- b 4- (y — 2)ab 


CO t aet Cre Dab 
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Note that the operators in the Hamacher family do not satisfy unanimity 
(Ci(a,a) Æ a). In fact, this function is a t-conorm (Definition 2.47) when 
values are restricted to [0,1], and, thus, max < Co. In the particular case of 
y = 1, it reduces to the algebraic sum 


Ci (a,b) = a +b — ab. 


For y = 2, and considering values in [—1, 1], this function corresponds to the 
rule for combining certainty factors in the expert system PROSPECTOR: 


a+b 
l--ab 





C»(a, b) = (4.4) 
This expression also corresponds to the rule for combining velocities in the 
theory of relativity. 

For y = oo, we have that Cæ corresponds to Lmax (Definition 2.14). 


4.1.1 Uninorms and Nullnorms 


Uninorms and nullnorms are two other well-known families of associative op- 
erators. Nevertheless, it has to be said that, in general, these operators are not 
aggregation operators, in the sense that their outcome is not always between 
the minimum and the maximum. That is, they do not satisfy Equation 1.1 in 
Section 1.1: 


min(aj,...,aw) € C(aj,..., aw) € max(ai,...,aw). (4.5) 


Uninorms 


We now start reviewing some results concerning uninorms. Nullnorms are 
described in the next section. 


Definition 4.3. An operator UN from [0,1] into [0,1] is a UniNorm if it is 
associative, symmetric, and has a neutral element e in (0,1) (i.e., UN(a,e) = 


a). 


Although, as stated, uninorms do not, in general, satisfy Equation 4.5, this 
equation holds when the neutral element is such that 


min(a;,...,aN) € e € max(ai,..., ay). 


Equivalently, Equation 4.5 holds when there are some a; > e and some a; < e. 

An important result about uninorms is that two families of uninorms can 
be constructed in an easy way from t-norms and t-conorms. In short, the ag- 
gregation is defined conjunctively (in terms of a t-norm) when all the values 
to be aggregated are small (less than the neutral element), and disjunctively 
(in terms of a t-conorm) when all the values are large. In the remaining cases, 
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max L min A 








En max T min 














Fig. 4.1. Two families of uninorms 


which correspond to conflicts between both large and small values, either the 
minimum or the maximum can be applied. One family of uninorms corre- 
sponds to the case of selecting the minimum and the other to the case of 
selecting the maximum. 

Figure 4.1 gives a graphical representation of both families in the case 
where only two inputs are considered. Recall that, due to associativity, the 
definition for two inputs is enough for specifying an N input uninorm. Propo- 
sition 4.4 gives a formalization of this graphical representation, and gives the 
proper construction for both families. It shows how to build uninorms from a 
given pair (T,.L), where T is a t-norm and L is a t-conorm. 


Proposition 4.4. Given a t-norm T, a t-conorm L, and e € [0,1], the fol- 
lowing two expressions are uninorms: 





e: T(ai/e,...,aw/e) if maxa; < e 
UN. tal Gijcang uw x E (i ee TEES vts) finas >e 
min(aj,...,aN) otherwise. 
(4.6) 
e: T(ai/e,...,aw/e) if maxa; < e 
UN* (ai... ayn) = 4 e (1— 6): L(9—,..., $) if mina; >e 
max(ai,...,aN) otherwise. 
(4.7) 


Among the uninorms that can be expressed using the proposition above, 
two special cases are distinguished. They correspond to the case of the pair 
(T,1) where the t-norm T is the minimum and where the t-conorm L is 
the maximum. We will denote these uninorms by UN. = UNemin,max and 
UNé = U Nemin max, 
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Example 4.5. Let us consider five search engines, each returning a list of rele- 
vant URLs together with a relevance degree. Relevance degrees are given in 
the unit interval. Then, let us consider the merging of such results, taking into 
account the following: 


All items retrieved (by any of the search engines) will be in the final list. 
A new relevance degree will be computed for each item. This index is 
defined solely as the combination of the degrees assigned by the five search 
engines. 

Unavailable degrees will be considered as equal to zero. 

For combining the degrees we consider 0.6 as our central value. For values 
lower than 0.6, we require negative (towards zero) interaction. For values 
larger than 0.6, we require positive (towards one) interaction. In case of 
conflict, to avoid missclassifications, we will assign the largest value to the 
aggregation. 


This situation can be modeled with a uninorm: U N99: ! — for a t-norm T and a 
t-conorm L. For example, using Yager’s t-norm and t-conorm (Examples 2.45 
and 2.48, respectively) we have 


0.6(1 — 1^ (35 4(1 — a;)")'/*) if mazia; < 0.6 
UN991:.(0,,...,05) = $ 0.64-0.4(1 A O2 av) if minja; > 0.6 
aı V a2 V a3 V a4 V Gs otherwise. 
(4.8) 


All uninorms constructed using Proposition 4.4 use a minimum or a max- 
imum in the regions of conflict. That is, the minimum and the maximum are 
used to combine a pair (a,b), where a < e and b > e. Other uninorms can be 
defined so that such values are combined using an alternative function that is 
neither the maximum nor the minimum. 

The following proposition establishes how to construct uninorms of this 
form. 


Proposition 4.6. Let e be a value in |0, 
increasing continuous mapping, with h(0) 
then, the binary operator UN defined by 


1], and let h in [0,1] be a strictly 
= —oo, h(e) = 0, and h(1) = +œ; 


UN(a,b) = h7*(h(x) + h(y)), (4.9) 


for all (a,b) in [0,1] x [0,1] V ((0, 1), (1,0)}, and for (a,b) in ((0,1), (1,0)} 
either UN(0,1) = UN(1,0) = 0 or UN(1,0) = UN(0,1) = 1, is a uninorm 
with neutral element e. Uninorms of this form are strictly increasing and 
continuous on (0,1). 


Therefore, according to this proposition, given any function h satisfying the 
constraints above, we can construct two different uninorms using Equation 4.9. 
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One uninorm will have UN(0,1) = UN(1,0) = 0, and the other, UN(0,1) = 
UN(1,0) = 1. 
The example below shows such a construction for a particular function h. 


Example 4.7. Let he(x) be defined for x in (0,1) and for c > 0 as 


h.(x) = log ( — * Jogi - q)). 


Then, hz! (x) is 
hz (a) =1- e7. 
Note that, for he(x), the following hold: he(0) = limzoh-(x) = —oo, 
he(1) = limg—1h-(2) = OO, and h-(1 — e°) = 0. 
Then, using Proposition 4.6, the following expression is a uninorm with 


neutral element e, = 1—e~°: 


_ Qog(1—2))(log(1—v)) 


UN-(#,y) =1—e 


Another example of continuous uninorm is the following family of func- 
tions. 


Example 4.8. Let A > 0; then, the following expression is a uninorm with 


neutral element ey = 1/(1 + A): 


ALY 
PASSE e 


Proposition 4.9. The uninorm D) satisfies: 


1. Dy, X Dx, if A1 € A2 
2. ltmy—oD) = T min 





3. limy—1D) = Lmax 
4. The only self-dual operator in this family is Di. That is, Dı(x,y) = 1 — 
Dı(1 =T, 1— y). 


Example 4.10. Let us reconsider Example 4.5. The use of maximum in the 
conflicting regions leads to uninorms with discontinuity. In particular, we have 
discontinuity in (z,0.6) for x < 0.6 (see Equation 4.8). 

To avoid such discontinuities, we can use D, (x, y). As the example requires 
the neutral element to be 0.6, this means that 


ey — 0.6 2 1/(14- A). 


So, A — 1/0.6 — 1 — 2/3. 
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Fig. 4.2. A generic family of nullnorms (left) and N Ne,min,max (right) 


Nullnorms 


Nullnorms are operators that are also associative and symmetric; we denote 
them by NN. From the point of view of their properties, instead of having 
a neutral element e (as uninorms have) they are characterized because there 
exists an element e such that, for all x € e, NN(0,x) = x, and for all x > e, 
NN(1,z) = zx. This is established in the following definition. 


Definition 4.11. An operator NN from [0, 1|". into [0,1] is a NullNorm if it 
is associative, symmetric, and there is an element e in (0, 1) such that 

e for alla<e, NN(0,a) 2a 

e foralla 2e, NN(1,a) 2a 


'The following result permits us to construct nullnorms from t-norms and 
t-conorms. 


Proposition 4.12. Let T, L, and e be a t-norm, t-conorm, and a value in 
(0, 1); then, the operator N Ne, 1, 





e: L(ai/e,...,aw/e) if maxa; < e 
NNe,7,1(a1,---,an) = 4 e+ (1—e)- T(S2,..., 8-2) if mina; > e 
€ otherwise, 
(4.10) 


is a nullnorm. 


Figure 4.2 illustrates this proposition as well as the particular nullnorm 
N Ne min,max for the particular case of N = 2. 

The following proposition shows that all nullnorms can be defined in terms 
of t-norms, t-conorms, the neutral element, and the median operator. 
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Proposition 4.13. NN is a nullnorm if and only if there is a t-norm T, a 
t-conorm L, and a neutral element e € [0,1] such that NN can be expressed 
as 

NN(ai,...,aN) ps M(T (ai, ..., aw), L(a, ..., ax), €) 


where M is the median operator. 


Recall that, the median of (a,b, c) is the second largest value in (a, 5, c]. Due 
to the fact that T(a1,..., aw) € L(ai,..., aw), this proposition is equivalent 
to considering the following three cases: 


e When a; € e for all aj, the output is L(a1,...,an) 
e When a; > e for all aj, the output is T(a1,...,an) 
e When some a; « e and some other a; > e, the output is e 





Therefore, nullnorms do not satisfy Equation 4.5 (internality). In fact, 
internality is only satisfied when there are a; « e and some other a; > e. 
However, in this case the output is just e. 


Uninorms vs. nullnorms 


Figure 4.3 illustrates that uninorms and nullnorms have complementary be- 
havior. Note that, for a,b < e, the uninorm behaves conjunctively (i.e., it 
yields an output smaller than or equal to min(a,b)) while the nullnorm be- 
haves disjunctively (i.e., it yields a value larger than or equal to max(a, b)). 
For a,b > e, the effect is the complement. Therefore, the nullnorm tends to 
concentrate the outcome around e, while the uninorm tends to move away 
from e. This fact is illustrated with arrows, either attracting to e (Figure 4.3, 
right) or moving away from e (Figure 4.3, left). 

This dual nature of uninorms, with conjunctive and disjunctive regions, 
has been exploited in some applications. For example, the operators for par- 
allel combination of certainty factors in the expert systems MYCIN and 
PROSPECTOR were uninorms: the certainty factors were defined in [—1, 1], 
and thus a transformation from [—1, 1] to [0,1] is needed. In any case, these 
operators had a conjunctive behavior for values x,y in [—1, 0], and a disjunc- 
tive behavior for values z,y in [0, 1]. For example, the combination function 
of MYCIN (in [-1, 1]) was defined as 


z-cy-—acy if0< min(z,y) 
C(x, y) = emo if min(z, y) < 0 < max(z, y) (4.11) 
zx-cgy-ccxy if max(z,y) <0. 





It can be observed that the expression used for combining positive values 
x,y is the algebraic t-conorm. So, as stated, this combination function has a 
disjunctive behavior in this region. 

Another example is the combination function in the expert system PROSPEC- 
TOR (Equation 4.4) for values in [—1, 1]. This function, when data is mapped 
into [0,1], corresponds to Dy with A = 1. 
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Fig. 4.3. Comparison between uninorms (left) and nullnorms (right) 


4.2 Separability: the Quasi-arithmetic Means 


In this section we review some results that rely on associativity. In particular, 
we study aggregation operators C that are separable. Informally, C is separa- 
ble when the influence of each variable can be made explicit and translated 
into a new space where no interactions are considered. Then, as there are 
no interactions in such a space, values can be combined with an associative 
operator. This is formally stated as follows. 


Definition 4.14. C(a1,..., ay) is separable when there exist functions g1,...,9N 


and o such that 


C(ai, eg an) = gi(ai) Oo g2(a2) OR 0 gn(an), 
with o a continuous, associative, and cancellative operator. 


Note that g;(a;) only depends on the ith source (the source x; that delivers 
ai) and on the value a; supplied by it. As will be seen later on, the weighted 
mean is an example of a separable function, where the g;(a;) is the value a; 
supplied by source x; prorated by a weight associated with x;. 


Proposition 4.15. An operator C is separable in terms of monotone increas- 
ing functions g1,...,gN, and a continuous, associative, and cancellative o, if 
and only if it is of the form 


C(a,...,aN) = $70, (9:(a%))) = 9! O d: (as), 


where ¢;(a) = $(gi(a)) for all i = 1,..., N, and Q^! is the inverse function 
of 9. 
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This proposition is based on Theorem 4.1 (on associativity). 

Now, we turn to the where that all influences are measured in the same way. 
That is, when g; = gj = g for all i,j. In this case, the following proposition 
holds. 


Proposition 4.16. An operator C is separable in terms of a unique monotone 
increasing function g if and only if it is of the form 


N 
C(ai... aN) = Aoa ó(g(a;))). (4.12) 


'Then, in aggregation, we expect that the operator satisfy unanimity, as we 
expect that, when all information sources supply the same information (i.e., 
agree on a given value), the outcome is this very value: 


C(a, ...,a) =a. (4.13) 
This leads to the following result: 


Proposition 4.17. An operator C is separable in terms of a unique monotone 
increasing g and satisfies unanimity if and only if is of the form: 


1 N 
C(a,..., ay) =o FD 90). (4.14) 


Proof. The proof of this proposition is based on the unanimity condition: 
a=C(a,...,a) = $1. $(g(a))). Therefore, (a) = N¢(g(a)) and, thus, 
g(a) correspond to $^! (ó(a)/N). By replacing g(a) in Equation 4.12 by this 
expression, the proposition is proved. 














Aggregation operators following Expression 4.14 are the quasi-arithmetic 
means (also known as generalized ¢-means). Note that they have the form 
of an arithmetic mean iua bi) once data is mapped by ¢: I — J into 
the space J (b; — $(a;)). Then, after aggregation the result is mapped back 
(by $71) into the space I. The quasi-arithmetic mean is a family that encom- 
passes several well-known aggregation operators. Different ¢ lead to different 
operators. In particular, d(x) = x yields the arithmetic mean, (x) = log(x) 
yields the geometric mean, and $(x) = 1/z leads to the harmonic mean. Ta- 
ble 4.1 gives the main quasi-arithmetic means. Trigonometric means, which 
correspond to @ equal to sin, cos, or tan, are not displayed, although they 
have also been studied in the literature. 

A few properties have been proved for such means. Some of them, corre- 
sponding to characterizations of the means, are given in the next sections. A 
well-known property is that the harmonic (H M), the geometric (GM), and 
the arithmetic mean (AM) satisfy the following relation: 
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Name Generator function C(ai,...,aN) 
Arithmetic mean ó(r)-—c Diti 
3 N SN Ti 
Geometric mean (x) = log x = 
Harmonic mean d(x) = 1/x a 
i=1 xj 
Say UN 3? 
Root-mean-square d(x) = 2 x 
= r2 e/ X r? 
Root-mean-power ó(x)-—r n 
ial =e log (Eg ^ 
Exponential mean o(x) =e og (£44 —) 

: " N el/zi N71 
Radical mean elz) = cl’ (Jog. (ERE 
Basis-exponential mean (x2) =a m s.t. m" = Si 

N qti 
Basis-radical mean ole) = «1/* m s.t. mim = Xii 


Table 4.1. Main quasi-arithmetic means. For basis-exponential and basis-radical 
mean, the values should satisfy x; > 1/e 


HM < GM < AM. 


Figure 4.4 illustrates this property for two inputs. Note that, for a pair 
a,b, AM represents (a + b)/2, GM represents vab and HM corresponds to 
2ab/ (a + b). The segment (a — b)/2 that is used for the computation of GM 
is also given in the figure. 

Root-mean-powers (also known as the rth power mean or generalized 
mean) are another example of quasi-arithmetic means. They are obtained 
from Equation 4.14 with a generating function of the form ó(x) = z“. We will 
use RM P, to denote the operators for a particular a. So, according to the 
previous definition: 


1 
RM P4(ai,...,aN) = (= yay, 


The following proposition can be proved for root-mean-powers: 


Proposition 4.18. Let RM P, be the root-mean-power with parameter a. 
Then, the following holds 


e RMP, < RMP, forr«s 
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Fig. 4.4. Graphical representation that for a pair (a,b) such that, if a > b, it holds 
that H M(a,b) < GM(a,b) € AM(a,b), where HM stands for the harmonic mean, 
GM stands for the geometric mean, and AM stands for the arithmetic mean 


limao RMP, = GM 

lima—oo RM P, = max 

lima, o9 RM Px = min 

fora > 1, we have (Minkowski's inequality) 


N 


N 
(Sole 89*)"* < (Yat) + «Qoo y" 


i=l i 
and, for œ < 1, we have 


N 


N N 
(Yi + 6?) ydus > (>_ a?) ae Oo nae 


i i 


Naturally, equality holds for a = 1. 


4.3 Aggregation and Measurement Scales 


When measurement scales are taken into consideration, the outcome of a 
function should be consistent with the scale, and changes to the scale should 
lead to consistent changes on the outcome. For example, if we are aggregating 
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Name a C(ai,...,aN) 
Arithmetic mean a=1 Militi 
N 2 
A z 
Root-mean-square a=2 T i 
Harmonic mean a= —1 N 
Xizi zx 


'Table 4.2. Some root-mean-powers 


data representing the execution time of a particular program on different 
computers, the outcome of the aggregation should not depend on whether the 
time is expressed in hours, minutes, or seconds. 

Such an idea of consistency is formalized in terms of the permissible trans- 
formations in the desired scale. Then, when a function behaves properly with 
respect to a set of transformations 9, we say that the function is -invariant 
or ?-meaningful. This concept is defined below. 


Definition 4.19. Let ® be a set of transformations on E; then, a function 
C: EN = E is -invariant (or ®-meaningful or ®-ordinally stable) under 4 
if, for any ó € ®(E) and any x; € EX, we have 


C(o(21),---, ó(zw)) = O(C(@1,-.-,@N)). 


Now, we study aggregation operators in some particular scales. The oper- 
ators should be invariant to permissible transformations. For example, in the 
case of ratio scales, aggregation should be invariant to transformations of the 
form ¢(x) = rz for positive r. This property, known as positive homogeneity, 
is matematically expressed as follows: 


C(ray,...,ran) = rC(ai,..., aw) (4.15) 


for r >0 

We will consider below not only conditions corresponding to permissible 
transformations, but also others that seem reasonable for aggregation. This is 
the case for unanimity. 

Let us turn again into positive homogeneity. We will give below some 
results satisfying this property. 


Proposition 4.20. An operator C is separable in terms of a unique monotone 
increasing g and satisfies unanimity and positive homogeneity if and only if C 
is either the root-mean-power with parameter a zz 0 (RM Px) or the geometric 
mean. 
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227- 202- 201- 209- 222- 228- 213- 
Runtime system mtr jess compress db mpegaudio jack javac| GM 





Sun JDK 1.5.0 Client VM| 325 221 204 43.4 251 192 96.1 |162.13 
Sun JDK 1.4.2 Client VM| 318 186 199 43.6 249 181 90.9 |154.51 
Kaffe 32 21.3 191 24.8 101 32.9 21.3 | 41.95 







Table 4.3. Performance comparison on a Pentium 4 computer. Execution times of 
the seven benchmark programs in SPEC JVM98 for three Java runtime systems. 
Times are given in seconds. The average time using the geometric mean (GM) 


Recall that according to Proposition 4.18, the geometric mean corresponds 
to lima=0o RM Py. 

Table 4.2 gives some of the aggregation functions that can be expressed as 
root-mean-powers for some values of a. 


Example 4.21. Let us consider the problem of assessing the performance of 
some Java runtime systems. To do so, each program from the SPEC JVM98 
is executed on each runtime system. Then, such execution times should be 
aggregated. Table 4.3 represents the execution times of the seven benchmark 
programs in SPEC JVM98 for three java runtime systems. 

'Then, when a separable operator C is used for computing the mean value, 
C should be either the root-mean-power or the geometric mean, as C should 
satisfy unanimity and positive homogeneity. This is because, when execution 
times are equal, the outcome should correspond to such a value, and because 
the value should be consistent when time is represented in seconds, minutes, 
or any other unit. 

Table 4.3 gives the performance of the runtime systems when the aggre- 
gation function is the geometric means. 


Another property is reciprocity, which might be required in ratio scales. 
It is expressed as follows: 


C(1/ai,...,1/an) = 1/C(ai,..., an). (4.16) 
It is illustrated with an example. 


Example 4.22. Let us consider the case of assessing the relative importance 
of two criteria (price vs. comfort) for buying a particular product. Then, if 
@1,...,@5 are the subjective evaluations from five members of the same family, 
we can assess their relative importance by C(a;,...,a5). 

Reciprocity establishes in this setting that the aggregation is consistent, 
considering either price vs. comfort (when values a; are used) or comfort vs. 
price (when values 1/a; are used). 


When reciprocity is added to unanimity and separability, the following 
result can be proved: 
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Proposition 4.23. An operator C is separable in terms of a unique monotone 
increasing g and satisfies unanimity and reciprocity (Equation 4.16) if and 
only if C is of the form 


1 
C(a1,...,aN) = expw (= Sul log(a;)) (4.17) 


with w an arbitrary odd function. Here exp is the exponential function, and 
expa corresponds to e^. 


Now, we consider, at the same time, positive homogeneity and reciprocity. 


Proposition 4.24. An operator C is separable in terms of a unique monotone 
increasing g and satisfies unanimity, positive homogeneity (Equation 4.15), 
and reciprocity (Equation 4.16) if and only if C is the geometric mean: 


N 
C(ai, tees ,an) = (TI ag 
i—l 


In Section 2.1.3, it was said that the permissible transformations in interval 
scales are of the form y(x) = ax + 8. When the data to be aggregated belongs 
to an interval scale, the following proposition can be applied. Note that here 
we drop the condition about separability. 


Proposition 4.25. For a > 0 and B € R, such that aa; + B € [0,1], an 
operator C satisfies 


C(aaı + B,...,aan + B) —^ aC(ai,...,aN) - B 








if and only if there exists an operator C' such that, for all (a1,...,aw) in 
[0, 1], we have 
| fa if a; =a for all à 
C(ai... aw) = { (a* — a, )C' (Bae rns -2x) +a, otherwise, 


(4.18) 


where a* = maxa; and a, = min aj. 


Proposition 4.26. For a > 0 and B € R, such that aa; + B € [0,1], an 
operator C satisfies 


C(aa1 + B,...,aan + B) — aC(a;,...,aw) + B 





if and only if there exists an operator C' such that, for all (a1,...,an) in 
[0, 1]N, we have 
| ja if aj =a for all à 
Caissis aw) = { a+ oC (4=4,..., =) otherwise, (4.10) 
E N N z 
where à = $55; ai/N and o = 4/ i ((ai — a)2)/N. 
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4.3.1 Ordinal Scales 


In the previous section, we have considered interval and ratio scales. We con- 
sider here ordinal scales. This corresponds to the case in which permissible 
transformations are functions $ such that x > y implies (x) > ¢(y). 

We start by defining the boolean max-min functions. They are used later 
in this section. 


Definition 4.27. Let X = {21,...,an} be the set of information sources 
(reference set), let a; = f(a;) be the value supplied by source xi, and let 
S = (S; V7, be a family of subsets of X. Then, the boolean max-min function 
of S is defined by 


Bs(ai,.. an) = Vica ^x;€S; f(x). 


Note that f(x;) is only required to take values in a totally ordered set. So, 
it is valid for f to be a function from X into L, where L is a set of ordered 
categories. 

In the following, however, we will assume that the values belong to a real 
interval E, and then consider 6$, the group of all increasing bijections ¢ : E — 
E (an automorphism of E). So, we will consider -invariance (Definition 4.19) 
under increasing bijections. 


Example 4.28. Let us consider the problem of evaluating students on the basis 
of their marks in five different subjects: three science and two humanities. The 
three science subjects are Mathematics (ML), Physics (P), and Mathematical 
Logic (ML), and the two humanities subjects are Literature (L) and Greek 
(G). 

We will consider good students those who have a good mark in at least 
two scientific and one of the other subjects. The marks are represented by an 
application f into [0, 1] (the larger the better). 

This situation can be modeled using the boolean max-min function given 
below. It corresponds to the boolean max-min function of S, with S defined 
by S = {S;}$_; and Sı = {M, P, L}, S2 = (M, ML, L}, 53 = {P, ML, L}, 
S4 = {M, P,G}, Ss = (M, ML, G}, and Sg = (P, ML, G}. Note that the sets 
S; are the ones with two scientific subjects and either Literature or Greek: 


(f(M) ^ FCP) ^ f(2)) V (F(M) ^ FUME) ^ f(2)) v (FCP) ^ fM D) ^ f(2))v 
(FM) ^ f(P) ^ f(G)) V (FC) ^ fM L) ^ f(G)) V (f(P) ^ f(M D) ^ f(G)) 
Besides $-invariance, the property of -comparison is also of interest. 


Definition 4.29. A function C : EN — R is 6-comparison meaningful if, for 
any ¢ € ®(E) and any zi, x, € EN, we have that 


C(zi,..., vw) = C(z,..., m) implies 
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C(6(z1), ..., é(zN)) = C(O(x}),-.-, (ey). 
'The two properties are related, as shown by the following result. 


Proposition 4.30. Let C : EN — E be a function satisfying unanimity; then, 
C is $-comparison meaningful if and only if it is ®-invariant. Invariance im- 
plies meaningfulness, and meaningfulness with unanimity implies invariance. 


Now, we give some results directly concerning aggregation operators. The 
first one shows that Bs is the right operator in ordinal scales when an operator 
with unanimity and monotonicity is required. The characterization requires 
that E open. 


Proposition 4.31. For open sets E, an operator C : EN — R satisfies 
unanimity, satisfies increasing monotonicity in each argument, and is P- 
comparison meaningful if and only if there exists a family S of subsets of 
X such that C= Bs. 


An alternative characterization exists that avoids E being open. In this 
case, C should be continuous. 


Proposition 4.32. A continuous operator C : EN — R satisfies unanimity 
and is ®-comparison meaningful if and only if there exists a family S of subsets 
of X such that C = Bs. 


Now, when symmetry is also required, the permitted operators are re- 
stricted to be order statistics. Again, two propositions can be proved, one 
requiring the symmetric function to be continuous and the other requiring 
increasing monotonicity in each argument at the expense of E being open. 
Only the second one is given here. 


Proposition 4.33. For open sets E, an operator C : EN — R satisfies sym- 
metry, increasing monotonicity in each argument, and unanimity, and is P- 
comparison meaningful if and only if there exists a k € {1,...,N} such that 
C = x,(y), where £s(k) corresponds to the kth order statistic. 


It is known that, for an odd N, one of the order statistics (O.S(y41)/2) is 
the median. We now give a characterization of the median for odd values of 
N. 'The characterization uses the invariance of the aggregation with respect 
to a decreasing bijection. As this decreasing bijection can be understood as a 
kind of negation or complement, we will denote it by neg. 


Definition 4.34. Let neg : E — E be a decreasing bijection; then, the func- 
tion C : EN — R is neg-stable if, for any x, z' € E, we have that 


f(vi,...,0n) = f(z5,... m) implies 


f(neg(z1),. ..,neg(zu)) = f(neg(a),..-,neg(2'y)). 
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As in the previous cases, two characterizations of the median can be ob- 
tained. We offer the one that considers the monotonicity of the arguments. 


Proposition 4.35. Let N be odd, and let neg be a decreasing bijection; then, 
the operator C : EN — R fulfills symmetry, increasing monotonicity in each 
argument, and unanimity, and is ®-comparison meaningful and neg-stable if 
and only if C is the median. 


Example 4.36. Let us consider again the problem of evaluating students on 
the basis of the five subjects in Example 4.28, with marks represented by f 
into [0, 1]. Then, when no relevance is given to any subject and unanimity is 
considered, any order statistic might be selected for aggregating the marks. 

When any symmetry can be found in the range of marks (e.g., an aver- 
age value of a can be compensated by 1 — a), let the function neg express 
this symmetry. In this case, the assumption of neg-stability implies that the 
aggregation operator is the median. 

Informally, the neg-stability corresponds to the fact that, when two stu- 
dents have the same average, that average should be kept equal regardless of 
whether we are considering their original marks or the symmetric ones (i.e., 
regardless of whether we are considering f(subject) or 1 — f(subject)). 


4.3.2 Different Data in Different Scales 


Section 4.3 is devoted to the consideration of different scales (nominal, ratio, 
or ordinal). Nevertheless, we have restricted ourselves to the case where all 
values are described using the same scale, and, thus, the permissible transfor- 
mations are the same for all values. There are some results in the literature 
weakening these constraints. That is, operators that permit different data to 
be represented in different scales. In such a case, the results should be consis- 
tent under the changes of scale of the data sources. We consider below one of 
the results obtained in this context. We focus on the case of N data from N 
different ratio scales. 


Proposition 4.37. Let us consider N evaluations of a given object using IN 
ratio scales with independent units. Then, an operator C that satisfies una- 
nimity and symmetry is meaningful to all the scales if and only if C is the 
geometric mean: 


N 
1/N 
C(a1,...,aN) = (Ile) / ] 
i=1 
The following example illustrates this proposition. 
Example 4.38. Let us consider the cities of Cirat and València and their proz- 
imity to Barcelona. Let us consider the distance of these two cities to Barcelona 


with respect to three scales: km, miles, and hours. We will consider them as 
ratio scales. The values under consideration for these distances are as follows: 
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e Distance(Barcelona, València) = 349 km, 216.85855 miles, 3.916 h (3h 55’) 
e Distance(Barcelona, Cirat) = 335 km, 208.16 miles, 4.016 (4h 01’) 


To determine the nearest city, we will aggregate the values using the result of 
Proposition 4.37, as although the scales are ratio scales, the concrete scales 
(i.e., km, miles, and hours) are different. So, the aggregation operators should 
be the geometric mean. The results of the evaluation for the two cities are as 
follows: 


e València: (349 - 216.85855 - 3.916)!/? = 66.67273 
e Cirat: (335 - 208.16 - 4.016)!/? = 65.39263 


So, Cirat is the nearest city. 

Now, let us consider some transformations on the last scale. That is, let us 
reconsider the distance expressed in hours. If the scale units are changed from 
hours to minutes, we have València at 235 minutes and Cirat at 241 minutes, 
and if we consider the scale in seconds, then València at 14,100 seconds and 
Cirat at 14,460 seconds from Barcelona. Using the geometric mean, we get 
the following 


Scale in minutes: 
e València: (349 - 216.85855 - 235)!/3 = 261.02972 
e Cirat: (335 - 208.16 - 241)!/? = 256.1453 
Scale in seconds: 
e València: (349 - 216.85855 - 14100)!/? = 1021.8968 
e Cirat: (335 - 208.16 - 14460)!/? = 1002.7749 


So, naturally, in both cases, Cirat is still the nearest city. 

We now show that the use of another aggregation operator would modify 
our conclusions when the scale changes. In particular, let us consider the 
arithmetic mean of the three values. 


Scale in hours: 
e València: (349 + 216.85855 + 3.916)/3 = 189.92485 
e Cirat: (335 + 208.16 + 4.016)/3 = 182.39201 

Scale in minutes: 
e València: (349 + 216.85855 + 235)/3 = 266.95285 
e Cirat: (335 + 208.16 + 241)/3 = 261.3867 

Scale in seconds: 
e València: (349 + 216.85855 + 14100)/3 = 4888.6196 
e Cirat: (335 + 208.16 + 14460)/3 = 5001.053 














So, in this case, while the time required for traveling was considered in 
hours or minutes, the nearest city was Cirat. Nevertheless, this is not the 
case when the the last scale is expressed in seconds. So, the use of arithmetic 
mean is not sound under changes of scale. As Proposition 4.37 states, the only 
sound operation is the geometric mean, as it is the only one that ensures that 
changes in the scales do not alter the order inferred from the outcome. 
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4.4 Weighted Means 


'The aggregation operators reviewed so far do not include any kind of weight. In 
this section, we introduce weight. T'he typical operator is the weighted mean. 
In this operator, weights are used to represent importances of the information 
sources. 

Proposition 4.17 characterizes the quasi-arithmetic mean, which corre- 
sponds to the case in which all functions g; in Definition 4.14 are equal 
(gi = g for all i). When this restriction does not apply, we have weighted 
aggregation operators. We characterize these operators below. Such charac- 
terization requires the sensitivity of C instead of requiring the cancellativity 
of o. Sensitivity is defined as follows. 


Definition 4.39. An operator C is sensitive if, for all k — 1,..., N, when 
ak E d, 


C(ai, ..., Gk 1, Gk, Gk41, ---, AN) z C(a1,..., Gg 1, ak, Gio - AN). 


That is, changes in one parameter imply changes in the outcome. This 
property implies the cancellativity of o. 


Proposition 4.40. A sensitive operator C is separable in terms of functions 
gi and satisfies unanimity if and only if it is of the form 


C(a;,...,ay) = ou ói(a;)), (4.20) 


with ¢(x) = ee pi (x). 


From this proposition we obtain results similar to the ones reported above 
when gı =---=9n = g. 


Proposition 4.41. A sensitive operator C is separable in terms of mono- 
tone increasing gi and satisfies unanimity and positive homogeneity (Equa- 
tion 4.15) if and only if C is either the weighted root-mean-power, 


N 
C(ai,...,@n) = pany 
i=1 


or the weighted geometric mean, 
C(ai,..., aN) = lI. 


with p; #0, a; #0, and pm pi — 1, but otherwise arbitrary constants. 
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Proposition 4.42. A sensitive operator C is separable in terms of monotone 
increasing g; and satisfies unanimity and reciprocity (Equation 4.16) if and 
only if C is of the form 


N 
C(ai,..., aw) = expe (V , wi(log(a;))), (4.21) 


i=l 


with wi arbitrary, continuous, strictly monotone odd functions, and w(t) = 
N 
Mosa wilt) also strictly monotone. 


Proposition 4.43. A sensitive operator C is separable in terms of monotone 
increasing g; and satisfies unanimity, positive homogeneity (Equation 4.15), 
and reciprocity (Equation 4.16) if and only if C is the weighted geometric 
mean, 


N 
C(aj,...,aN) = II. 
=i 


with pi #0, and X pi — 1, but otherwise arbitrary constants. 


The weighted geometric mean (WGM) and the weighted mean (W M) 
satisfy WGM, < W My for any weighting vector p. 


4.4.1 Bajraktarevicé's Means 


Bajraktarevic’s means are a family of operators that generalize weighted 
means. Their definition has resemblances with a quasi-arithmetic mean with 
weights (the quasi-weighted mean or quasi-linear mean). While in the weighted 
mean, >>; piai, the weight p; is constant for all aj, in a Bajraktarevic's mean 
the weight is a function of the a;. Such a function is expressed by 7;(z;) in 
the next definition. 


Definition 4.44. Given functions x and à (where à is monotone increasing 
with inverse Q^! and m nonnegative), the Bajraktarevié’s mean is defined as 


follows: 
pdt Ti (ai) (ai) j^ 
Mua ma) 


The Bajraktarevics mean becomes the quasi-weighted mean when 7;(a;) 
is a constant that solely depends on i (7;(a;) = pi), and the quasi-arithmetic 
mean when 7;(a;) — k for all i. Families of quasi-weighted means can be 
generated applying the functions in Table 4.1. 

Another example is when 7;(a) = a? and ¢(a) = a"? for p > q (and 
o(a) = loga for p = q). In this case, we have that the Bajraktarevi¢’s mean 
reduces to 


C(ai,...,aN) = o*( 
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N aPXl/p- 
Xi A) Pag (4.22) 


for p > q, or to 


for p = q. 

These means can be further particularized to obtain a root-mean-power 
and a counter-harmonic mean, which corresponds to Equation 4.22 with q = 
p — 1. That is, 


C(ai... aN) = SS 


Note that this mean can be understood as the mean of a; with weights 
p-1 


a, 


4.5 Bibliographical Notes 


1. Associativity: Associativity has been extensively studied. Aczél (in [4]) 
gives detailed information on the authors who have dealt with associative 
operators. Aczél in [2] presents one of the characterizations. The book by 
Alsina, Frank, and Sklar [17] is entirely devoted to operators that satisfy 
this equation, for example, as t-norms. Alternatively, Sander in [346] re- 
views several results on associative aggregation operators. Other results 
can be obtained from papers on specific operators, for example, papers 
on t-norms, t-conorms, uninorms, nullnorms, and so on. See below for 
some references on uninorms and nullnorms. Section 2.3.1 was devoted to 
t-norms and t-conorms. See the corresponding bibliographical section for 
some references on these operators. 

The Hamacher family is described in several papers, e.g., [73]. The orig- 
inal definition seems to be in [180, 210]. 

The expression for combining velocities in the theory of relativity ap- 
pears in Einstein's work [117]. 

2. Combination of certainity factors: The expert system MYCIN is de- 
scribed in [48] and [360]. MYCIN introduces certainty factors and their 
parallel and sequential combination functions. The parallel combination 
is given in Equation 4.11. Tsadiras and Margaritis [419] prove that this 
function is a uninorm. Later, De Baets and Fodor [89] introduce uninorms 
in an arbitrary domain [a, b], and give a shorter proof for the result given 
in [419]. In particular, in the original domain [—1, 1], Equation 4.11 is a 
uninorm generated by the following generator: 
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h(a) = ss XH : à (4.23) 


with inverse 


h'(x) = 


E —1 ifr«0 (4.24) 


—e * c lifz 90. 


De Baets and Fodor [89] show that the operator used in PROSPEC- 
TOR [99] is also a uninorm. This operator corresponds to Cə (Equa- 
tion 4.4). They prove that when this operator is rescaled into [0, 1] (from 
the standard [—1, 1]), it corresponds to the uninorm Dy for A = 1. 

Details about the use of these two combination functions and their 
relationships in the framework of certainty factors are given in [184] (p. 
180). This work by Heckerman gives a sound probabilistic interpretation 
of these factors, showing some inconsistencies of the original definition in 
MYCIN. Previously, Hajek in [177] and later Hajek and Valdés in [178] 
also studied the problem of interpretations of certainty factors. 

'The definition of certainty factors as values "to represent subjective 
measures of change in belief" appears in [184]. Description of the combi- 
nation functions in the first expert systems is discussed in some artificial 
intelligence related books, e.g., [236]. 


. Uninorms: Uninorms were introduced by Yager and Rybalov in [457] 


(which is, probably, the oldest of Yager's published works [445]). Math- 
ematical properties of these operators are presented in [148]. Results by 
Dombi in [97] and Klement, Mesiar, and Pap in [209] are closely related 
and relevant to uninorms. In fact, [209] points out that the associative 
compensatory operators they define (rooted in Dombi's work) are the 
uninorms that are continuous on [0, 1]? \ {(0, 1), (1, 0)). The links between 
uninorms and previous research are also highlighted in [148]. Other results 
on uninorms can be found in [88] (on residual operators of uninorms). 

Recent reviews on uninorms, nullnorms, and related subjects can be 
found in [346] and [55]. The former surveys associative and increasing 
aggregation operators and the latter reviews (in Sections 6.2 and 6.3) 
uninorms and nullnorms. A related subject, not considered in this chapter, 
is the introduction of weights into uninorms. This was proposed by Yager 
and Rybalov in [457] and further studied in [451]. 


. The mean: The use of average is already present in ancient texts. Plackett 


in [322] provides some ancient references on the use of the mean, going 
back to Hipparchus (ca. 190 BC - ca. 120 BC) and Ptolemy (ca. 90 - ca. 
168), for the computation of the number of days in a year. An interesting 
book on the mathematics behind Ptolemy's Almagest is [316]. 

Gini, in his book [163], published in 1958, also gives some historical ref- 
erences from ancient times. Relationships between means and proportions 
are described: *Un' indagine sull’evoluzione del concetto di media dai tempi 
piu remoti fino ai giorni nostri, deve prendere le mosse dallo studio delle 
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proporzioni, poiché inizialmente no si faceva una rigorosa distinzione fra il 
concetto di media ed il concetto di proporzioni? ” (p. 1, [163]). Arithmetic, 
geometric, and harmonic means, as well as other means, are described 
with respect to particular proportions. Pythagoras and the Pythagoreans 
are mentioned, as they discovered the relationship between musical notes, 
numbers, and their proportion. 

Another relevant classical author is Pappus from Alexandria (fl. c. 300-c. 
350). His Synagoge (Book III) describes geometrical constructions of some 
means. Among them, we can find arithmetic, geometric, and harmonic 
means. We have consulted a French translation of his work [313]. [297] 
includes some graphical proofs (proofs without words). 

In 1755, Simpson [363] argued in favor of the use of the mean. In his 
paper, he claims that “some persons, of considerable note, have been of 
opinion, and even publicly maintained, that one single observation, taken 
with due care, was as much to be relied on as the Mean of a great number." 
He discusses this fact and studies some situations proving that the mean 
leads to better results, and he concludes recommending “the use of the 
method, not only to astronomers, but to all others concerned in making 
experiments of any kind." 

More recently, Cauchy (1821) [66] defined the mean of zi,...,xw as 
a value x fulfilling internality (Equation 4.5). Also, in 1821, [20] was 
published. This paper, which is anonymous, is attributed to Svanberg 
(see [123], p. 157). The author deals with the problem of finding the best 
average of a number of observations. It distinguishes between means for 
values that are different in origin and values that are the same in origin but 
different due to imperfection of instruments or errors due to “la maladresse 
ou la négligence de ceux qui ont mis ces instrumens et ces procédés en us- 
age?” (p. 181). The paper presents some means, some of them weighted. 
It also defines an iterative approach using weighted means where weights 
are determined using previously estimated means. That is, 





1 
Qo tima 
— ST 
Neu 


where m^ is estimated using the weighted mean. 

'The paper also expresses the idea of mean as the value that differs as 
little as possible from the values being aggregated: “La moyenne cherchée 
est alors une combinaison de ces divers résultats, de laquelle on puisse 
présumer qu'elle différe moins du véritable et unique résultat que toute 


m 





0 


? An investigation on the evolution of the concept of mean throughout history 
should start with the study of proportions, since initially there was no strict 
distinction between the concept of mean and the concept of proportion 

3 The clumsiness or carelessness of those that put these instruments and procedures 
into use 
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autre combinaison qu'on pourrait faire des mêmes donnés?" (p. 187). This 
is related to the aggregation as the object that is located at the least 
distance of the ones to be aggregated (as in Figure 1.5). 

Chisini (1929) [76] defined the mean of a function f(z1,...,xw) as the 
value 3 such that f(zxi,...,xwN) = f(z,..., x). De Finetti (1931) [91] re- 
considers Chisini's definition for the mean and offers an alternative one: “si 
definisce media di una grandezza in una data distribuzione (di qualunque 
natura essa sia) per rapporto a un'assegnata circostanza quell'unico val- 
ore della grandezza che si può sostituire alla distribuzione senza alterarvi 
la circostanza in parola?" (p. 375). A similar definition was previously 
given by Bemporad (1926) [40]: “Siano x1, %2,...,%n-1,Un m misure in- 
dipendenti di una stessa quantità x. Assumeremo i postulati seguenti: I.- 
Qualunque sia il numero delle misure eseguite, esiste un numero che é da 
assumersi come risultato complessivo di tutte le misure. Il risultato comp- 
lessivo di un numero qualunque di misure è quantità in tutto paragonabile 
al risultato singolo di una sola osservazione, e deve quindi essere consid- 
erato alla stessa stregua. Esso è dato da una funzione delle £1, £2,..., En, 
finita e continua insieme con le sue derivate parziali prime ê.” Bemporad 
added to this definition three additional constraints (conditions II, III, 
and IV in [40]) to further restrict the function. 

The work by Kolmogorov [214] and Nagumo [286] (1930) had a strong 
influence on the field. They studied means taking into consideration de- 
composability. An operator is decomposable if when considering a se- 
quence of functions 


CO) (a4), CO? (a4, a2), C® (a1, a2, a3), E , CC (a1, ao, s ih) chee 
we have 
CÓ (ai, +++) Qk, Qk+1;-- iim.) = C (a, 050; 0k 1s vq) 


fork — 1,...,m and a = C(9 (a,,...,a,). 
More specifically, Kolmogorov [214] and Nagumo [286] independently 
studied decomposability and proved that the quasi-arithmetic mean is 


^ 'The mean we look for is, then, a combination of the different results, which can 
be presumed that differs less from the true and unique result than any other 
combination that can be obtained from the same data 

The mean of a physical quantity for a given distribution (of whichever nature) 
with respect to a given setting is defined by the single quantity that can replace 
the distribution without modifying the above mentioned setting 

Let z1,22,..., 4-1, t4 be n independent measures of the same quantity x. We 
will assume the following axioms: I.- Whichever the number of measures carried 
out, there is a number that must be taken as the overall result of all the measures. 
The overall result of an arbitrary number of measures is a quantity completely 
comparable to the individual result of a single observation, and should be con- 
sidered in the same way. It is given by a function of z1,22,..., x4, bounded, 
continuous and with continuous first partial derivatives 
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characterized by continuity, symmetry, strict increasingness, and decom- 
posability. 

Another example of a basic property studied in relation to means is 
bisymmetry. That is, C(C(z, y), C(u, z)) = C(C(z, u), C(y, z)). This prop- 
erty has been used for characterizing quasi-arithmetic means. See [4] (p. 
281). 

Historical notes, references, and more recent results on means can be 

found in the books by Aczél [4], Aczél and Dhombres [11], Chapter 5 
of [146], and the following papers [144] and [246], among others. [144] 
generalizes the results of Kolmogorov and Nagumo by considering non- 
strict means (instead of strict ones). 
. Aggregation on numerical scales: Sections 4.2 and 4.3 are mainly 
based on [8] by Aczél and Alsina, and [4] and [5] by Aczél. Some prop- 
erties also come from the book by Hardy, Littlewoord, and Pólya [122]. 
This book, devoted to inequalities, contains some of the basic results on 
aggregation in numerical scales. 

The separability of an aggregation operator was first considered in [13]. 
[13] proves Proposition 4.23, and then characterizes the geometric mean in 
terms of the quasi-arithmetic mean, reciprocity, and positive homogeneity 
(Proposition 4.24). Proofs of Proposition 4.20 can be found in Jessen [205] 
and [91], p. 390-392 (see also [4], pp. 150-153, and [122], pp. 68-69). 

Example 4.21 uses the data from performance comparison of Java/.NET 
runtimes given in [361]. The use of aggregation operators to combine per- 
formance and times can be found in several other works. For example, [46] 
compares average execution times to evaluate strategies for garbage col- 
lection (in Java), and [49] compares execution times of benchmarks in 
Java, Fortran, and C. 

Table 4.1 is based on [8] and [50] (mainly p. 218). Gini in [163] includes, 
most of the means in this chapter (e.g., the root-mean-power on p. 138 
and the counter-harmonic mean on p. 139). The root-mean-powers are 
linked with the expressions for computing aggregation as the object that 
minimizes a distance when the distance is d; (a, b) = |a — b|P. Some results 
for the aggregation operators can be found in the Bibliographical Notes 
of Chapter 1. Nevertheless, the root-mean-powers with parameter p — 1 
is usually not equal to the object obtained with dp, except for p = 1 and 
p — 2. This result is reported in Taguchi [387] (1974). The root-mean- 
powers were studied previously by Fechner [134] (1878) as generalized 
measures of location. In fact, he considered measures of both location and 
dispersion, and, thus, his work also considered the aggregation operators 
defined in Chapter 1. 

Properties of quasi-arithmetic means are mainly based on [51, 50, 122]. 
The geometric proof that HM < GM < AM (Figure 4.4) is given in [50] 
(p. 45) and [434]. The first proof is by Pappus of Alexandria in Synagoge 
(Book III) [313]. Proposition 4.25 is given in [251] and [12] (p. 220, case 
5b). Proposition 4.26 corresponds to Theorem 2 in [4] (p. 236). 
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Other conditions than the ones considered here have also been studied 
in the literature. For example, equations of the form 


Tab os d) = Cocos us)? 


are considered for one p with p # 0,1,—1. See [7]. This work also 
solves the case of two different equations with two different p; and po 
(log |pi|/ log |p2| should be irrational and finite). The consideration of two 
different aggregation operators, one for measures and the other for ratios, 
was considered by Aczél and Alsina [8] (see also [9]). This corresponds to 
the following equation (C and Q represent the aggregation operators): 


ai an | C(ai,..., ax) 
by’? bn C(b1,...,bn) 


The case of aggregation operators for data represented in different scales 
is studied in [12]. Different cases corresponding to interval and ratio scales 
are analyzed and characterized. Proposition 4.37 in this chapter corre- 
sponds to Corollary 3.1 in [12]. 

Chapter 6 describe other operators for numerical scales. The biblio- 
graphical notes in it are mainly devoted to such operators. 
Aggregation in ordinal scales: Aggregation in ordinal scales has been 
studied by several authors. Section 4.3.1 is based on the works by Marichal 
(specially, [249]) and Ovchinnikov (specially, [305]). In particular, Propo- 
sition 4.30 is given in [305]. 

Some of the results presented here have been proved under different 
assumptions on the range of the functions, e.g., R or ordered sets. 
See [247, 252, 304, 307] for details. See also [245] for a review of some 
results in this area. 

The operators considered in this section are not the only ones available 
for ordinal scales. As will be seen in Chapter 6, there are other opera- 
tors that can also be applied. In particular, the Sugeno integral (see Sec- 
tion 6.4), and some weighted operators, such as the weighted minimum 
and the weighted maximum (see Section 6.3), are suitable for ordinal 
scales. As will be shown, such operators do not satisfy symmetry. 
Weighted means: Separability for the weighted means was considered 
by Aczél in [5]. The results described there are analogous to the ones of 
the arithmetic mean. Propositions 4.41, 4.42, and 4.43 are proved in that 
paper. 

The weighted root-mean-powers were studied in [35] (1938). Bajrak- 
tarevié’s mean was introduced in [28], and further studied by Bajrak- 
tarevié in [29, 30] and by Losonczi in [237]. [29] considers weighting func- 
tions of the form p;7(a;), and on p. 73 it gives an expression analogous to 
Equation 4.22 but with additional weights pi: 


Q( 


(E pia; ) Ups 


N 
2n pia; 


4.5 Bibliographical Notes 109 


Bajraktarevié’s means have been recently applied in [106]. For properties 
of the mean, such as relationships with the root-mean-powers and the 
counter-harmonic means, see [50]. 


5 


Fuzzy Measures 


Sanbonn no yat 


Japanese saying 


Most aggregation operators use some kind of parameterization to express addi- 
tional information about the objects that take part in the aggregation process. 
Applying the jargon of artificial intelligence, we can say that the parameters 
are used to represent the background knowledge. For example, it is well known 
that in the case of the weighted mean, the weights — i.e., the weighting vector 
— play this role. In an application, we can use them to express the reliability 
of the information sources (sensors, experts, and so on). For example, when 
fusing data from sensors, we can express wich sensor is more likely to give 
data of better quality and which is more likely to give erroneous data. In a 
similar way, other aggregation functions use other parameterizations. 

Among all the existing types of parameters, the fuzzy measures are a rich 
and important family. They are of interest here because they are used for 
aggregation purposes in conjunction with several fuzzy integrals (e.g., Choquet 
and Sugeno integrals). Also, they are general in the sense that, when used 
with some of the integrals, they can generalize some well-known aggregation 
operators, such as the weighted mean. 

It has to be said that the term “fuzzy measure" is not the only one used 
in the literature. Other names are also present to refer to the same concept. 
Some of them are non-additive probabilities and capacities. 

Due to the importance of fuzzy measures, we devote this chapter to them. 
We give their definitions, establish some of the main families, and review some 
of their properties. Chapter 6 shows their use with fuzzy integrals and provides 
a justification of their interest in aggregation (see Section 6.2). 


! One arrow is weak, but if three arrows come together they are very strong 
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5.1 Definitions, Interpretations, and Properties 


We begin with the definition of fuzzy measures. We assume that the set over 
which the fuzzy measure is defined is finite, as this is the usual case, with 
aggregation operators. Nevertheless, some of the results given here also hold 
for an infinite set. 


Definition 5.1. A fuzzy measure u on a set X is a set function u : p(X) > 
[0, 1] satisfying the following axioms: 


(i) u(0) — 0, u(X) = 1 (boundary conditions) 
(ii) A C B implies (A) € u(B) (monotonicity) 


The requirement that the measure for the whole set is 1 (u(X) = 1) is an 
arbitrary convention; in general, any other value might be used. Nevertheless, 
this requirement is specially convenient for aggregation purposes, and it is, 
in fact, a condition analogous to the one for the weighted means to have 
weights that add to 1. Therefore, unless otherwise stated, we assume this 
bound throughout this book. 

Among fuzzy measures, we distinguish those where p(A) is either 0 or 1. 
Such measures are known as 0-1 fuzzy measures. We define them below, as 
they will be used later. 


Definition 5.2. u is a 0-1 fuzzy measure if u is a fuzzy measure and, for all 
A C X, it holds that u(A) € (0,1). 


As can be observed from their definition, fuzzy measures replace the ax- 
iom of additivity satisfied by probability measures (see Section 2.2) by a more 
general one, monotonicity. This implies that probability measures are partic- 
ular cases of fuzzy ones. They correspond, in fact, to additive fuzzy measures 
(measures satisfying u(A U B) = u(A) + u(B)). 

Additive measures have the important property that the whole measure 
can be defined from the values assigned to the singletons. This corresponds to 
having a mapping from X into [0, 1] (a probability distribution) and defining 
the measure of a set A (the probability of the set A) from such a mapping. Let 
p be the mapping, then (A) = } ac 4 p(a). From this point of view, weighting 
vectors can be seen as probability distributions, and, thus, they can be used 
to infer fuzzy measures. 


Definition 5.3. Let p be a weighting vector defined over the set X (i.e., p: 
X — [0,1] with $5, cx p(zi) = 1); then, the additive fuzzy measure p inferred 
from p is defined as 


u(A) = M pla) 


acA 
for all A C X. 
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Fig. 5.1. y-inter-additive partition of X = (X1, X2,..., Xs). L is used to represent 
the operator to combine the measure from the different partition elements (i.e., the 
addition) 


Additivity implies that the measure of a set is the summation of the mea- 
sures of the elements. Thus, no interaction among the elements with respect 
to the measure is considered. In fact, we can consider that this approach of 
the measure of a set is implicitly followed when we use the weighted mean. 
This consideration is supported by the fact that the Choquet integral (a fuzzy 
integral described in Section 6.2) with respect to an additive fuzzy measure 
u inferred from a weighting vector p is equivalent to a weighted mean with 
respect to p (see Theorem 6.24). 

In additive measures, addition is used to combine the measure of the single- 
tons. À natural generalization to this is to use addition to combine measures of 
larger sets. In this case, for pairs z;, z; we might have u((2i,25)) Z u((1xi)) 
u({zx;}). Nevertheless, for larger sets we might have u(X UY) = n(X) 4- (Y), 
with X n Y = @. Figure 5.1 gives a graphical representation of this kind of 
measure, and its formalization is given below. In the figure, | is used to repre- 
sent the combination operator (addition). The next definition formalizes this 
concept. 


Definition 5.4. Let ji be a fuzzy measure on X and let P = (X1,..., Xs} be 
a partition of X; then, P is a p-inter-additive partition of X if 


uA) - M. u(An X;) 
XiEP 
for every A € p(X). If P is a n-inter-additive partition of X with at least two 
elements, then we will say that u is an inter-additive fuzzy measure. 


Naturally, if P = {{xi}, {vo},..., {vn }} is a p-inter-additive partition of X, 
then p is additive. 

We now consider an example of fuzzy measure that is neither additive nor 
p-inter-additive. 
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Definition 5.5. Let X be a reference set; then, the measure u* (the strongest 
fuzzy measure) is defined by 


0 ifA-0 


1 otherwise 


(A) ={ 


This measure can be used to model total ignorance, because the measure 
is maximal for all subsets of X. 

An important concept for fuzzy measures is their duality. This is estab- 
lished in the following definition. 


Definition 5.6. Let uj and u be two fuzzy measures om X ; then, u and u2 
are dual conjugates if and only if they satisfy 


in (A) = 1 = ua(X V A) (5.1) 
for all A C X. 
'The following two propositions hold for fuzzy measures. 
Proposition 5.7. The dual of a fuzzy measure is another fuzzy measure. 


Proposition 5.8. Let u be an additive fuzzy measure; then, the dual of u is 
L. 


Example 5.9. The dual of measure u* in Definition 5.5 is 


0 if AAX 
1 otherwise. 


maa=] 


This measure is the weakest fuzzy measure. 


Another concept worth presenting here is consistency between fuzzy mea- 
sures. 


Definition 5.10. Given two fuzzy measures p and u’ on X, we say that u 
is consistent with u when u(A) > u(B) implies that p'(A) > u'(B) for all 
A, B C X. Ifẹ is consistent with u, and u is consistent with u’, then we say 
that u and u’ are consistent fuzzy measures. 


Example 5.11. Let us consider the fuzzy measures u and u’ defined as in Ta- 
ble 5.1. We have that u’ is consistent with u. Nevertheless, u is not consistent 
with u’ because u'((z2,23)) < u'((z1, £3}) but u({x2,£3}) = u((21, £3}). 
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[21,22] 

(22, £3} 

[z1,23] ^ x 
(zi, T2, x3} 1.0|1.0 





Table 5.1. Fuzzy measures yz and pi’ for Example 5.11. 


5.1.1 Interpretations 


'The meaning of fuzzy measures has been studied by several authors since their 
inception. Interpretations are tightly related with particular data uses, and, 
additionally, some families of fuzzy measures permit some specific interpre- 
tations. The most rellevant case corresponds to probabilities. A related case 
is the distorted probability (described in Section 5.4), which can be seen as 
a probability distribution that has been distorted by a function. Finally, 0-1 
fuzzy measures can be understood in terms of coalitions, e.g., a parliament; 
that is, given a 0-1 fuzzy measure u, we have that (A) is 1 if and only if the 
coalition A is able to pass a bill in the parliament. 

'This section presents some of the main interpretations. They are described 
below. 


1. Fuzziness: A fuzzy measure u for a set A, i.e., u(A), corresponds to the 
grade that an element in the reference set X belongs to the set A. Here, 
fuzziness is understood as a kind of uncertainty that is different from 
randomness. Such an approach follows the interpretation of membership 
in fuzzy sets, their being different from probability distributions. 

2. Importance: (A) stands for the degree of importance, or weight, of the 
set A when computing the aggregated value for X. This interpretation is 
specially suited when combining criteria or experts. A related definition is 
the one that says that (A) measures the power of A to make the decision 
alone (without the remaining criteria in X \ A). 

3. Probability: Several interpretations have been constructed using prob- 
ability distributions as their cornerstone. They have mainly addressed 
the belief and plausibility measures (a particular type of fuzzy measure 
described in Section 5.2). The following are examples of interpretations 
based on probabilities. 


a) Belief and plausibility as inner and outer measures. In a probability 
space (X, X, P), the probability P is not necessarily defined on p(X) 
but instead only on X. In this case, belief (denoted by Bel) and plau- 
sibility (Pl) measures can be defined on p(X) as extensions of P on 
X. They are defined as follows: 
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Bel(A) = sup (P(X)|X C A and X € X) 


PI(A) = inf {P(X)|X 2 A and X € X) 


Although, for all probability spaces (X, X, P), the functions Bel 
and Pl are belief and plausibility measures, the converse is not true. 
Nevertheless, it is possible to establish such a converse relation in 
terms of belief functions on a set of formulas rather than on sets. 

b) Probability interval induced by a belief measure. Each belief measure 
Bel with its dual plausibility Pl induces a probability interval Pger: 


Pse = {p|Bel(A) < p(A) < PI(A) for all A € o(X)). 


Although every belief function is a lower envelope of a probability 
interval, not every lower envelope is a belief function. 

c) Belief as information loss. Belief is understood as a probability that 

has suffered from a process of information loss. This process consists 

on transferring support from sets A into larger sets B such that B > A. 

Distorted probabilities. Fuzzy measures on X can be interpreted in 

terms of a set of probability distributions on disjoint sets of X and 

a distortion function that, eventually, combines the probabilities and 

modifies the values. 

e) Mapping between spaces. In this case, a fuzzy measure u in a space 
(X,2*) is represented in terms of an additive measure À on a mea- 
surable space (0,29) and a mapping v from one space to the other. 
That is, when v : p(X) — (0), we have (A) = A(v(A)). It has been 
proved that the Choquet integral is consistent with this interpretation. 


= 


When used in aggregation, fuzzy measures are defined to quantify aspects 
related to the information sources that supply information. According to this, 
X is equal to the set of information sources. In this case, the interpretations 
considered above correspond to. 


1. Fuzziness: (A) is the grade that the correct answer is given by an infor- 
mation source that belongs to A. When the grade is bounded and mono- 
tonic, conditions in Definition 5.1 hold. 

2. Importance: &(A) stands for the degree of importance of the set A. 
In this setting, condition (i) in Definition 5.1 means that when the set of 
sources is empty their importance is 0 and that the maximal importance is 
obtained when all the sources are considered (and the maximal importance 
is 1); condition (ii) in Definition 5.1 means that the more sources we have, 
the greater is the importance. 

3. Probability: When the measure p is additive, we have a probability dis- 
tribution on the information sources. Then, (A) is the probability of the 
sources in A being correct. In contrast, when the measure is not additive, 
we have a generalized probability on the set of information sources. 
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u({L}) = 0.3 |u(M,PLp =1 
Table 5.2. Fuzzy measure on the set X = {M, P, L} following Example 5.12 





An alternative way to interpret fuzzy measures is from an operational point 
of view. As said before, fuzzy measures are often used in conjunction with 
fuzzy integrals. In this case, the fuzzy integral of the characteristic function 
of a set A corresponds to the fuzzy measure of A. This is, given A C X, let 
f(x;) = 1 if and only if z; € A; then, the fuzzy integral of f with respect 
to u is (A). Accordingly, and independently of any denotational semantics 
for (A), we can define u on the basis of the desired outcome for the fuzzy 
integral. 

An illustrative example of fuzzy measures frequently used in the literature 
is the one defined in the following example. It is similar to Example 4.28. 


Example 5.12. The director of a high school has to evaluate the students ac- 
cording to their level in mathematics (M), physics (P), and literature (L). 
'The evaluation consists of obtaining a final rating as an average of the ratings 
of the three subjects. For each student, the final rating depends on the impor- 
tance given to the subjects. To settle these importances, a fuzzy measure is 
used. Here, X is the set of all subjects (i.e., X = (M, P, L3), and p(A) is the 
importance of a particular set of subjects A. The definition of the measure 
considers the following elements. 


1. Boundary conditions: 
(0) =0, w({M, P, L}) =1 
The importance of the empty set is 0. The set consisting of all objects 
has maximum importance. 
2. Relative importance of scientific versus literary subjects: 
w({M}) = w({P}) = 0.45, w({L}) = 0.3 
The importance of mathematics and physics is greater than the impor- 
tance of literature. 
3. Redudancy between mathematics and physics: 
w({M, P}) = 0.5 < n((M]) + w({P}) 
Mathematics and physics are similar subjects. The importance of the 
set containing both should not be larger than their addition. 
4. Support between literature and scientific subjects: 
L( £M, L}) = w({P, L]) = 0.9 > w({P}) + w({L}) = 0.45 + 0.3 = 0.75 
w({M, L}) = w({P, L]) = 0.9 > w({M}) + w({L}) = 0.45 + 0.3 = 0.75 
Mathematics and literature are complementary subjects. 


An outline of this fuzzy measure is given in Table 5.2. 


118 5 Fuzzy Measures 


Now, we consider another example. 


Definition 5.13. Let X be a set, and let M be a set of fuzzy measures on X 
such that, for all pı, i3 € M, we have ui(1x)) = po({x}) for all x € X; then, 
we define the minimal and the maximal measure in M as follows: 


1. u is the minimal measure of M if, for all A € p(X) and all w € M, 
we have u(A) < p'(A). Note that the minimal measure of a set can be 
built from the measure of the singletons in the following way: u(A) = 
mazaeAL(1a)) if AA X and u(X) — 1. 

2. w is the maximal measure of M if, for all A € p(X) and all u/ € M, we 
have u(A) > u'(A). Note that maximal measures satisfy (A) = 1 for all 
|A| > 1. 


Denoting the measure for the singletons as a mapping v from X into [0, 1], 
we denote the corresponding minimal measure by fyn(y), and the corre- 
sponding maximal measure by um x (v). 


5.1.2 Properties 


An important issue when defining fuzzy measures in practical applications is 
the number of parameters required. It is clear that additive measures require 
only |X| parameters (the values on the singleton) and that unrestricted fuzzy 
measures require 2!*!, Note that the number of parameters is, in fact, reduced 
by 1 or 2 when boundary conditions are considered: |X| — 1 for additive 
measures (as they add to 1) and 2!*! — 2 for unconstrained ones (as (0) = 0 
and u(X) = 1). Additionally, when defining a measure, besides supplying a 
certain number of values, some checking is required to ensure that such values 
satisfy the monotonicity constraints. Note that, in general, there exist |X|! 
different monotonic sequences of subsets of X. 

Unconstrained fuzzy measures are an example of measures that not only 
need to fix the values, but also to check them. In this case, 2!*! — 2 values 
are required, and all sequences have to be checked. In contrast, for additive 
measures, only |X| — 1 values are required and the only checking is whether 
their sum is equal to 1. Also, for the minimal and the maximal measure of a 
mapping v, we only need the mapping (that is, |X| values), and to check that 
there is no x; such that v(z;) > 1. 

Fuzzy measures can be rewritten in an alternative way through the Móbius 
transform. The Mobius transform of a fuzzy measure u is a function m : 
p(X) > R such that m(0) = 0, $^ 4c m(A) = 1, and, if A C B, then 
Xoca Mm(C) € X ocg mC). 


Definition 5.14. Let u be a fuzzy measure; then, its Möbius transform m is 
defined as 


my(A) : $ 5 (-1)4!-IPlu(B) (5.2) 


BCA 
for all A C X. 
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(0) — 0 m({M, L)) = 0.15 
({M} = 0.45|m(1P, L}) = 0.15 
( ( 


{P}) 2 045 |m({M, P}) = —0.4 
m({L}) 203 |m({M, P, L}) = —0.1 


Table 5.3. Mobius transform of the measure given in Table 5.2. 





Note that the function m is not restricted to the [0, 1] interval. 
Given a function m that is a Mobius transform, we can reconstruct the 
original measure as follows: 


for all AC X. 


Example 5.15. Let u be the fuzzy measure in Example 5.12 (and outlined in 
Table 5.2); then, the Möbius transform of u is given in Table 5.3. 


When a measure is additive, the Mobius transform on the singletons cor- 
responds to the probability distribution, and it is zero for non-singletons. 


Proposition 5.16. Let u be an additive fuzzy measure and let m be its Mobius 
transform; then, m(A) = 0 for all |A| > 1. Moreover, let p(z;) = m((zi]) for 
all x; € X; then, p is a probability distribution, or a weighting vector, that 
infers p. 


Taking into account the Mobius transform, it is possible to define a fam- 
ily of fuzzy measures on the basis of the largest set A with non-null m(A). 
This family of fuzzy measures is called k-order additive fuzzy measures. k 
corresponds to the cardinality of such a largest set A. 


Definition 5.17. Let u be a fuzzy measure and let m be its Mobius transform; 
then, u is a k-order additive fuzzy measure if m(S) = 0 for any S C X such 
that |S| > k, and there exists at least one S C X with |S| = k such that 
m(S) #0. 


It is easy to see that any fuzzy measure can be represented as a k-order 
additive fuzzy measure with an appropriate value of k. Thus, if [u] is the set 
of all k-order additive fuzzy measures on X, ([u]i Yi |x| is a partition of 
the set of all fuzzy measures on X. 

This family of measures can be seen as a generalization of additive ones, 
as [u]t* is the set of additive measures. In fact, understanding the Möbius 
transform as a function that makes explicit the interactions between the in- 
formation sources, k-order additive fuzzy measures stand for measures where 
the interactions can only be expressed up to dimension k, but not for larger 


Fed 
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dimensions. When k = 2, only binary interactions are allowed, while when 
k = |X|, all kinds of interactions are permitted. 

It is clear that the value of k corresponds to the complexity of the measure, 
and, thus, the number of parameters needed for its determination increases 
when k increases. The next proposition makes this fact concrete. 


Proposition 5.18. Let X be a set of cardinality N. Then, a k-order additive 


fuzzy measure requires 
j=l Y 


parameters in order to be defined. 


A few other properties of fuzzy measures are also of interest. The next few 
definitions establish them. 


Definition 5.19. We say that a fuzzy measure u is 


1. k-order monotone (or k-monotone) for k > 2, if, for all families of k 
subsets A1,... Ay in X, 


pA e. “So. (DE A), 


i=1 OAIC{I,...k} iel 


1-monotonicity is defined as monotonicity. 
2. totally monotone if it is k-monotone for any k > 1. 
3. k-order alternative (or k-alternative) for k > 2, if for all family of k 
subsets A1,... Ak in X, 


(]A)S SS COM JA). 


i=1 OATC{l,...k} iel 


2-monotonicity is sometimes known as supermodularity or convexity; 2- 
alternating fuzzy measures are sometimes called submodular measures. 


5.2 Belief and Plausibility Measures 


The mathematical theory of evidence is based on the belief and plausibility 
measures. They are fuzzy measures that satisfy some additional constraints 
(see Definitions 5.20 and 5.21 below), and that can be easily defined using the 
Mobius transform. 


Definition 5.20. A fuzzy measure Bel on a set X is a belief measure (also 
called a belief function) if and only if it satisfies (i) and (ii) in Definition 5.1 
and the following equation: 
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Bel(A1U ...U An) > V; Bel(Aj) — X` Bel(Aj N Ak) - + 
j j«k 
(—1)"*" Bel(Ay N... N An). (5.3) 


Definition 5.21. A fuzzy measure Pl on a set X is a plausibility measure if 
and only if it satisfies (i) and (ii) in Definition 5.1 and the following equation: 


PI(A1 n... An) < 5 PI(A;) - M. PI(A;U Ag) +... + 
j j«k 
(—1)**! PI(A, U ... U An). (5.4) 


Belief and plausibility functions are dual in the sense of Definition 5.6. 
'This is, given a belief measure Bel, its dual is a plausibility measure, and the 
dual of this plausibility measure is Bel. 

There is an alternative and equivalent definition of belief and plausibil- 
ity measures based on the basic probability assignment function (bpa). This 
function corresponds to a Mobius transform that is positive and restricted to 
[0,1] for all A C X. Definitions 5.20 and 5.21 establish concepts equivalent 
to those defined above in terms of basic probability assignments. This is so 
because for each dual pair of fuzzy measures there is corresponding bpa, and 
for each bpa there is a dual pair of belief and plausibility measures. 


Definition 5.22. A function m : p(X) — [0,1] is a basic probability assign- 
ment if and only if 
(i) m(0) =0 
(ii) P Acx m(A) —1 
Given such a function, the two fuzzy measures are built in the following 
way. 
Proposition 5.23. Let m be a basic probability assignment defined on the 
reference set X. Then, the following holds: 
1. The function Bel : p(X) — [0,1] defined by 
Bel, (A) :— 5 m(B) for all A C X (5.5) 
BCA 
is a belief measure (we will call it the belief measure induced from m). 
2. The function Pl: (X) — [0,1] defined by 
Plm(A) := » m(B) for all AC X 
BnAz0 


is a, plausibility measure. 
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3. The belief and plausibility measures induced from m are dual (Bel, (A) = 
1— Pla (X \ A) for all AC X). 


The bpa can be understood as an assignment of an amount of informa- 
tion that one commits specifically to A (and not to any subset of A). This 
information might refer to fuzziness, importance, or probability. 

'The reverse construction, i.e., building the basic probability assignment 
from a belief function Bel, is given by the following expression: 


my(A) = M; (-D ^ PI Bel(B). (5.6) 
BCA 
Note that this expression is the Möbius transform of u (given in Defini- 
tion 5.14), and, thus, it can be computed for any fuzzy measure. Nevertheless, 
when a measure is not a Belief, Expression 5.6 leads to negative values. This 
is the case of Example 5.15. Note that for the set (M, P) m is negative: 
m((M, P}) = —0.4 = Bel(( M, P}) — Bel({M}) — Bel({P}). 
For belief measures, the following result holds. 


Proposition 5.24. Let Bel be a belief measure and Pl be a plausibility mea- 
sure induced from a basic probability assignment m. Then, if Bel(A) = PI(A) 
for all AC X, m focuses only on singletons (i.e., m(A) =0 for all |A| > 1). 


In this case, m corresponds to the probability distribution. In fact, additive 
measures (probability measures) can be seen as both belief and plausibility 
ones, as the additivity axiom implies both inequalities 5.3 and 5.4. 

Basic probability assignments are of practical interest because they sim- 
plify the definition of a measure. When, instead of an arbitrary fuzzy measure, 
a bpa is considered for defining either Bel or Pl, we need only to check whether 
m is positive and whether the values add to 1. This has a cost of 2%, where 
N is the number of elements. In contrast, for an arbitrary fuzzy measure, we 
need to check consistency (i.e., whether (A) > p(B) for all A > B). This 
corresponds to several checks (between 0 and N) for all 2% subsets. Therefore, 
it is easier to have fuzzy measures that are Bel and Pl by construction. This 
requires positive m and the rescaling of the addition if the sum is greater than 
1. 


5.2.1 Belief Measures from Unconstrained Ones 


We consider the definition of belief measures from a Möbius transform. We 
show how this is applied to the fuzzy measure in Example 5.12 and to the 
Mobius transform in Example 5.15. We will define a basic probability assign- 
ment applying a translation (that leads to m’) and a normalization (result- 
ing in m") on the function’s Móbius transform. The function m" that the 
transformation gives is a basic probability assignment. This is based on the 
following result which shows that the translation and normalization maintain 
monotonicity on two sets. 
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2a/k 2a/k 
b)/k (2a+b)/k 





Fb 2a-b (2a 4 
0 0 (4a + k)/k 
0 a a/k a/k 
Table 5.4. Fuzzy measure on the set X = {M, P, L} that is consistent with the 
measure in Example 5.12 





Proposition 5.25. Let u be a fuzzy measure with m(A) < 0 for some A C X, 
and let m be its Möbius transform. Then, the measure m” defined as m" (0) = 
0 and as 


e m'(A) 2 m(A) — minoccx m(C) for all ü C AC X 
m"(A) = m'(A)/ X pcx m'(D) for allü C AC X 


satisfies 
if w(A) > u(B) then um" (A) > um" (B) for all A, B. (5.7) 
Proof. Note that defining 


= m(0) — mincc x m(C) 
Z scx MD) — minccx m(C) 


m" (0) 


would lead to um” (0) z 0. Nevertheless, Equation 5.7 is also satisfied when 
A = Ú or B = 0). This is so because (i) if A = 0, there is no u(B) such 
that w(A) = u(0) = 0 > p(B) and (ii) if B = 0, then u(A) > u(0) = 0 
and um” (A) > pai (B) = 0, because having fim” (A) € 0 would require that 
p(A) € minccxm(C), but, as minccx m(C) is negative, this is impossible. 


The proposition above permits us to obtain the following result on consis- 
tency (consistency was defined in Definition 5.10). 


Corollary 5.26. For each unconstrained fuzzy measure u, there exists a belief 
measure consistent with u. 


Nevertheless, y and um” are not consistent, as we might have um (A) > 
Iam" (D) when u(A) € p(B). Table 5.4 illustrates this case. ym” ({£1, £2}) > 
Hm ({£2}) but p(121,22]) = w({w2}). 

With regard to this proposition, we define a fuzzy measure Bel that is a 
belief function and that is consistent with the measure u in Example 5.12 in 
the sense that Bel(A) > Bel(B) if and only if (A) > u(B). 


Example 5.27. Let u be the fuzzy measure in Example 5.12 and let m, be its 
Mobius transform; let m" be the basic probability assignment defined from 
m using Proposition 5.25. Then, the measure um” defined from m" is given 
in Table 5.5, and the basic probability assignment is given in Table 5.6. 
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n(0) = LOM, L}) = 0.5526 
"ue = 0.2237 |u((P, L}) = 0.5526 
u( 


{ P}) = 0.2237 |u({M, P) = 0.4474 
w({L}) = 0.1842 |u({M, P, L}) 21 


Table 5.5. Fuzzy measure on the set X = {M, P, L} that is consistent with the 
measure in Example 5.12 





m({M, L}) = 0.1447 
(£P, L}) = 0.1447 
( 


m({L}) = 0.1842 |m({M, P, L}) = 0.0790 


Table 5.6. Mobius transform of the measure given in Table 5.5 





5.2.2 Possibility and Necessity Measures 


Possibility and necessity measures are particular cases of belief and plausibil- 
ity. We define them below and give some of their properties. 


Definition 5.28. A fuzzy measure Pos on a set X is a possibility measure if 
it satisfies 


Pos(AU B) = max(Pos(A), Pos(B)). (5.8) 


A fuzzy measure Nec on a set X is a necessity measure if it satisfies 
Nec(AN B) = min(Nec(A), Nec(B)). (5.9) 


These equations, together with duality (Equation 5.1), establish a tight rela- 
tion between the two measures. The relation is established in the next propo- 
sition. 


Proposition 5.29. Let Nec be a necessity measure and let Pos be its dual 
possibility measure; then, the following implications hold for all A € p(X): 


Nec(A) » 0 implies Pos(A) 21 
Pos(A) <1 implies Nec(A) =0 
To define possibility measures, two alternative approaches can be used: 


1. Ina way similar to probability measures, that can be determined by prob- 
ability distributions (or weighting vectors), possibility measures can be de- 
termined by possibility distributions. This is based on Equation 5.8, that 
ressembles to the additivity axiom (u(A U B) = u(A) + u(B)) replacing 
addition by maximum. 

2. As necessity and possibility measures are, respectively, belief and plausi- 
bility measures, we can use basic probability assignments to define them. 
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We formalize both approaches below. First, we establish the relation be- 
tween possibility measures and possibility distributions. 


Definition 5.30. A possibility distribution is a mapping x from X to [0,1] 
such that max (c) = 1. 


Proposition 5.31. Possibility measures and possibility distributions can be 
built one from the other: 


1. Every possibility measure Pos is uniquely determined by a possibility dis- 
tribution function m, defined as follows: 


n(x) :— Pos({x;}) for all x; € X 


2. Let x be a possibility distribution defined over the set X , then the function 
Pos : X — [0,1] defined as: 


Pos(A) := max m(a) for all A C X 


ac 
is a, possibility measure (the possibility measure inferred from m ). 


Proposition 5.32. Let x be a possibility distribution, then Pos( A) = maxaea 7(a) 
and Nec(A) = 1 — maXag a (a) are dual (in the sense of Definition 5.6). 


'This result implies that the definition of Pos only requires a value for each 
element in X. Thus, the number of values required to define Pos is |X]. 

Possibility measures correspond to a restricted type of basic probability 
assignments: the consonant ones. We define now these assignments and, after 
that, we show how such assignments can be extracted from the measures. The 
obtention of the measures from the assignments follows Equation 5.5. 


Definition 5.33. A basic probability assignment m is consonant if and only 
if non zero values of m belong to a set of nested subsets of X. I.e., there is a 
complete sequence of nested subsets: 


Ay C Ag c...C Ayn = X 
and m(B) = 0, for all B Z A1, As, ..., An. 


Proposition 5.34. Let Pos be a plausibility measure on X = {21,...,Un}, 
and let m be its corresponding possibility distribution (with no loss of general- 
ity, we assume that 1 = (x1) > (x2) >... > 1(24)); then, the corresponding 
basic probability assignment m is of the form 


T(zi) — n(zi41) if A = (2i, mi) fori 1,...,n—1 
T (xg) ifA=xX 
0 otherwise. 


Possibility measures do not satisfy additivity, and are the minimal measure 
that can be built from 7 (in the sense of Definition 5.13). 
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5.3 L-Decomposable Fuzzy Measures 


In previous sections, we have studied additive measures and possibility mea- 
sures. In both cases, the measure of a set is composed from the measure on 
the singletons. This is, the measure is composed either from a probability or 
a possibility distribution. The difference is based on whether the composition 
is achieved using addition or using maximum. In this section, we describe L- 
decomposable fuzzy measures. In such measures, the composition is in terms 
of a t-conorm 1. 


Definition 5.35. A fuzzy measure u on a set X is a L-decomposable fuzzy 
measure if there exists a t-conorm L such that, for all A, B C X with ANB = 
(), it holds 

MAU B) = u(A)-u(B). 


Note that, in this definition, the t-conorm guarantees monotonicity. 

1-decomposable fuzzy measures permit their definition on the basis of a 
t-conorm and a set of values for the singletons. The only requirement is that 
the combination of all such values using a t-conorm be equal to 1. This is 
required for the boundary condition u(X) = 1. 


Proposition 5.36. Let L be a t-conorm and let v : X — [0,1] be such that: 
v(a1)L...Lo(an) = 1; 
then, the fuzzy measure defined by 


WA) = d if A = (xi) for any i = 1,...,|X| 


te,eAav(x;) otherwise 
is a L-decomposable measure. 


An alternative expression exists for u( A) when the t-conorm is an Archimedean 
one with a known increasing generator g (recall Theorem 2.49): 


v(zi) if A = (xj) for any i=1,...,|X| 


H(A) = un. A 9(v(2,))) otherwise. 


Note that a probability measure can be seen as a L-decomposable fuzzy 
measure with L(z,y) = min(x + y,1) (Lukasiewicz t-conorm). However, 
note that the reversal is not always true, because it could be the case 
that J> cx p((m;]) > 1, and in this case the Lukasiewicz t-conorm is 
1. An example of this case is given below. When the equality holds, i.e., 
P ouex HU 2i}) = 1, the application v : X — [0,1] is a weighting vector, and 
p is additive. 
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Example 5.37. Let X = (21,22, 23, £4, £5}, let L(x, y) = min(z + y, 1) be the 
Lukasiewicz t-conorm, and let v : X — [0,1] be defined by v(z1) = 0.05, 
v(z3) = 0.1, v(xa3) = 0.2,v(x4) = 0.4, and v(z5) = 0.8. The corresponding 
l-decomposable fuzzy measure is not an additive one. Note, for example, 
that 

H({22, 23,24, 25)) = 1 # u({£2, £3}) + w({e4, 25]) = 1.3. 


Another particular case of these measures is when the t-conorm is the 
maximum. Then, the corresponding measure is a possibility one. 
5.3.1 Sugeno A-measures 
Sugeno A-measures are also an example of |-decomposable fuzzy measures. 


Definition 5.38. Let u be a fuzzy measure; then, u is a Sugeno A-measure if 
for some fixed A > —1 it holds that 


WAU B) = u(A) + (B) + Au(A)u(B) (5.10) 
for all An B — 0 


Therefore, the definition of a measure only requires the values for all the 
singletons and A. The following proposition establishes this fact and gives the 
expression for the general case. 


Proposition 5.39. Let v : X — [0,1] and A > 1 be such that 


(/X)(II exl + Av(zi)]] - 1) 2 1 if A 40 


3 ex vlz) -1 if \=0; 
then, the fuzzy measure defined by 
v(zi) if A= {ai} 
p(AÀ) = 4 (1/A)(He,ea[l + Aule) — 1) if |A| Z 1 and A 40 
Lees M) if |A| #1 and A — 0 


is a Sugeno A-measure. 


As stated, Sugeno A-measures are a particular case of |-decomposable 
ones. This is implied by Equation 5.10 and the additional requirement that 
the Sugeno A-measure is a fuzzy measure (and thus bounded by 1). As the 
value 1 cannot be exceeded, Equation 5.10 can be considered as if values u(.A) 
and u(B) were combined using the following t-conorm: 


L(z, y) = min(1,z +y + Axy). (5.11) 


'This is Sugeno's t-conorm, as can be seen in Example 2.48. 
g 
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For countable sets, Sugeno A-measures are a special subclass of belief and 
plausibility measures. For A > 0, the measure is a belief one, and for A < 0, 
the measure is a plausibility one. Note that A = 0 corresponds to both types 
and, also, to additive fuzzy measures. 

We have seen that a Sugeno A-measure is determined from the values 
I((x;)) and A. In fact, as shown below, an important result proves that the 
measure on the singletons completely determines A. Accordingly, a measure 
of this family solely requires |X| values in order to be defined. 


Proposition 5.40. Let u be a Sugeno A-measure; then, for a fixed set of 
0 < u((z)) < 1, there exists a unique A € (—1, +00) and \ £0 that satisfies 
u(X) — 1, that is, satisfies 

A+ 1 = IQ M(ag). 


This proposition exploits the fact that 


MX) = (1/))(Hesex{l + do(ai)] — 1) = 1. 
The proposition establishes that, given the |X| values for the singletons, 
solving a (n — 1) degree polynomial, we will find the suitable value for A. 


Example 5.41. Let js be a Sugeno A-measure on (M, P, L}, with w({M}) = 
0.2237, u(( P) = 0.2237, and u({L}) = 0.1842. Then, using Proposition 5.40, 
we get A = 2.3860. This value for A is obtained as follows: 


A+ 1 = (1 + 0.2237A) - (1 + 0.22372) - (1 + 0.18424). 
'Thus, 


0 = —0.3684A + 0.13245277 M + 0.0092176795)?. 
'The solutions of this equations are 
à =0 
Ag = 2.385385 
As = —16.754812 


The only acceptable value is A2, as the others violate some of the con- 
straints of fuzzy measures. In particular, 41 implies that 


(X) = 0.2237 + 0.2237 + 0.1842 £ 1, 


and As (which is invalid, as it is no larger than -1) leads to negative values 
for u. For example, 


u({M, P}) = 0.2237 + 0.2237 + 0.2237 - 0.2237 - A2 = —0.39103916. 


In Table 5.7 we show the values obtained for u using A2. 

Note that the values for the singletons correspond to the values given for 
the same sets in Example 5.27. Nevertheless, the value for the other sets are 
not the same, as the measure in Example 5.27 is not a A-measure. 
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LOM, L}) = 0.5061911 
({P, L}) = 0.5061911 
( 


u({L}) = 0.1842 |u(£M, P, L} =1 


Table 5.7. Sugeno A-measure on the set X = (M, P, L} with the measures in the 
singletons equal to the one in Example 5.27 





Several results are known about these fuzzy measures. A few are given 
below. First, we review aspects related to the monotonicity of two measures. 


Proposition 5.42. Let uj and u2 be two Sugeno A-measures with parameters 
Ai and As, respectively. Then, if ui((xi]) > ua((mi]) for all x; € X, it holds 
that A1 < Ag. 


When A tends to —1, the measure tends to be such that (A) = 1 for all 
A # and u(0) = 0. Otherwise, when A tends to +00, the measure tends to 
be 0 for all A Z X and 1 for A= X. 

Now, we review another result on the relationship between fuzzy measures 
and their dual ones. 


Proposition 5.43. Let uy be a Sugeno A-measure; then, the following holds: 


1. The dual of ux is uXj(x41)-. That is, if ux is a belief, then ui. xj(x1) ds 
a, plausibility. 
2. The value that separates a set A and its complement X \ A is 


(-1 + A3 


That is, for a given uy, if uX(A) > (—1 + v1 ++ A)/A, then p(X V A) < 
(—1 + V1+A)/A. Note that, for additive measures (probabilities), this 
value is 0.5, because if P(A) > 0.5, then P(X \ A) 2 1— P(A) < 0.5. 


5.3.2 Hierarchically 1-Decomposable Fuzzy Measures 


In l-decomposable fuzzy measures, the measure for a set A is defined as 
the combination through the t-conorm L of the measures for the singletons 
{a;} € A. In some sense, the combination is homogeneous for all the elements 
in A, as all elements are combined using the same t-conorm. Hierarchically, L- 
Decomposable Fuzzy Measures (HDFM) weaken this constraint, allowing dif- 
ferent t-conorms in the combination process. 'To allow for different t-conorms, 
elements in X are structured into a hierarchical structure (a dendrogram), and 
then a t-conorm is attached to each node. The measure for a set A C X is 
computed using the hierarchy, and, more precisely, using a kind of projection 
of A on the hierarchy. We will formalize this. We start with an example. 
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(a) 





Fig. 5.2. A Hierarchically L-Decomposable Fuzzy Measure: (a) hierarchy; (b) pro- 
jection of a set A on the hierarchy; (c) hierarchy corresponding to the set A 


Example 5.44. Figure 5.2 (a) displays a hierarchical structure to be used in the 
definition of a hierarchically | -decomposable fuzzy measure. In this example, 
X consists of the set X = (21, 2, £3, ..., £10}. The hierarchy consists of the 
nodes {n1, n2, ..., ng], where each node n; has an attached t-conorm L;. The 
definition of the measure for a set A is computed by decomposing A into its 
components following the outline given by the hierarchical structure for X. 

Figure 5.2 (b) shows the projection of the set A = (xa, £4, £5, £7, £8, 29] 
on the dendrogram of Figure 5.2 (a). Figure 5.2 (c) corresponds to Figure 5.2 
(b) once unnecessary elements x; (elements not in A) as well as unnecessary 
nodes (nodes linking elements in A with elements not in A) are removed. 

Using the t-conorm L; and the structure of Figure 5.2(c), we can com- 
pute the measure for the set A. This measure is computed bottom-up. For 
simplicity, we can consider that we compute a value for each node, and that 
the value ng is the measure for the set A. That is, the value for node n4 is 
La(u(1z4)), n((x5])). The value for the node ng is: 


13(u({x3}), value(na)) = La(u{z3}), La(u(1a]); (125 })))- 


The value for node ng, which in this case is (A), corresponds to 


to(ts(u({w3}), La(u(1a3), n(1253))) a Co Qu (Gv p), wes p))n(110]))) 


To formalize these measures, we need, first, to formalize the hierarchy of 
elements. 


Definition 5.45. H is a hierarchy of elements X if and only if the following 
conditions are fulfilled: 


(i) All the elements in X belong to the hierarchy, and the corresponding nodes 
are the leaves of the hierarchy: 
For all x in X, (x) € H. 

(it) There is only one root in the hierarchy, and it is denoted by root. A node 
is the root if it is not included in any other node: 
if root € H, then there is no other node m € H such that root € m. 
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A 
ScS Hu 
ML P M L G 


Fig. 5.3. Hierarchy of elements corresponding to the subjects Mathematical Logic 
(M L), Physics (P), Mathematics (M), Literature (L) and Greek (G). ScS stands 
for Scientific Subjects and Hu stands for humanities 


(itt) All nodes belong to one and only one node, except for the root: 
if n € H and n F root, then there exists a single m € H such that n € m. 
(iv) All nodes that contain only one element are singletons: 
if |h| = 1, then there exists x € X such that h = (x) for all h € H. 
(v) All non-singletons are defined in terms of nodes that are in the tree: 
if |h| Æ 1, then, for all h; € h, h; € H. 


Definition 5.45 builds the hierarchy H, defining first a set of nodes, where 
each node is defined as a set of other nodes. This way, the definition given 
below for fuzzy measures of this kind is simple. Alternative definitions, where 
nodes are subsets of the set X, are also possible (and simpler), but then the 
definition of the measure is more complex. 

We consider below an example of hierarchy that will be used later to 
define a hierarchically |-decomposable fuzzy measure. The elements of this 
hierarchy follow Example 4.28, and are an extension of those in Example 5.12. 


Example 5.46. The evaluation of students in a high school is based on two 
sets of subjects: scientific subjects (ScS) and Humanities (Hu). The former 
set includes Mathematics (M), Physics (P), and Mathematical Logic (ML). 
The latter set includes Literature (L) and Greek (G). 

The hierarchy corresponding to these subjects is as follows (see Figure 5.3). 


H = {{ML},{M}, {P}, {L}, {G}, S68, Hu, A}, 
with 
Sasmi P Hue TG), (EN), and Aea Hu. 


The definition of hierarchically L-decomposable fuzzy measures requires 
two objects: the extension of a node and a labeled hierarchy. The former is 
defined over a node in the hierarchy as the set of elements in X that are 
embedded in that node. The labeled hierarchy assigns to each leaf in the 
hierarchy a real value in the unit interval, and, for each node that is not a 
leaf, a t-conorm. 
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Definition 5.47. Let H be a hierarchy according to Definition 5.45 and let h 
be a node in H; then, the extension of h in H is defined as: 


jh if |h| = 1 
Ae o cru if |h| Zi 
Definition 5.48. Let H be a hierarchy according to Definition 5.45; then, a 
labeled hierarchy L for H is a tuple L =< H, L,m >, where L is a function 
that maps each node n € H that is not a leaf into a t-conorm, and m is a 
function that maps each singleton into a value of the unit interval. 
For simplicity, we will express L(h) by Ln. 


Labeled hierarchies define fuzzy measures. The measure of a set of elements 
is based on the values that the function m associates with the singletons 
(the elements of that set), and the t-conorms of the nodes with a nonempty 
intersection with the set. For a singleton, the value of m is considered the 
measure of the singleton. For sets, the measure is defined recursively using the 
nodes in the hierarchy. The following definition describes how the measure is 
computed from the hierarchy. 


Definition 5.49. Let L =< H, L,m > be a labeled hierarchy according Def- 
inition 5.49; then, the corresponding Hierarchically L-Decomposable Fuzzy 
Measure (HDFM for short) of a set B is defined as u(B) = Hroot( B), where 
La for a node A = (a1,...,a4) is defined recursively as 


0 if|B| =0 
ya(B) = 4 m(B) if|B|- 1 
LA (tas (3), Han (Bn)) if |B| > 1. 


Here, B; = BN EXT (aj) for all a; in A. 
Proposition 5.50. When u(.X) = 1, Definition 5.49 leads to a fuzzy measure. 


Proof. Note that 


(i) the values of the measure belong to the unit interval. This is implied 
by the fact that the fuzzy measure is built only from the function m (a 
function into the unit interval) and the t-conorms (which are functions 
from [0, 1] x [0, 1] into [0, 1]). 

(ii) u(0) = 0 is implied by the definition. 

(iii) Monotonicity is implied by the monotonicity of the t-conorm. 


Example 5.51. Let us consider the fuzzy measure in Example 5.12. This mea- 
sure can be represented as a hierarchically decomposable fuzzy measure with 
X = (M, P, L}, the hierarchy H as in Figure 5.4, m defined according to Ex- 
ample 5.12 (i.e., m(M) = m(P) = 0.45 and m(L) = 0.3), and the t-conorms 
Lses and L4 defined as follows: 
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Fig. 5.4. Hierarchy of elements for representing the fuzzy measure defined in Ex- 
ample 5.12 


e Sses(x,y) = (2 + y")!/*, with w = (In2)/(In0.5 — In0.45) = 6.5788 
e Sa(x,y) = fU (f(a) + f(y)), where f(x) is defined as: 


20x if x € [0, 1/2] 
f(a) = $ 34 142 if x € [1/2, 3/4] 
64+ 10x if x € [3/4, 1]. 
Note that this hierarchy and the labelling lead to a fuzzy measure that is 
equivalent to the one in Example 5.12 for all subsets of X. 





Hierarchically decomposable fuzzy measures compute the measure for a 
subset B in X in terms of the measure of disjoint subsets of B. Then, the 
measures are combined using the t-conorm S of the smallest node in the 
tree that encompasses all the elements in B. The same procedure is applied 
recursively until we get values for the leaves/singletons (the values in m). This 
is shown in the following example. 


Example 5.52. Let X and H be as in Example 5.46 (i.e., X = {ML, P, M, L, G} 
and H as in Figure 5.3). Then, 


w({P, M; L,GY) = La(u({P, M, L, G}NEXT(ScS)), w({P, M, L, G}NEXT(L))). 
This can be further decomposed into 
BLP, M,L, GJ) = La(lses(m(P), m(M)), Liu(m(L), m(G))). 


In the definition above we have considered that the nodes of the hiearchy 
could gather two or more nodes. In fact, it is possible to constrain all nodes 
to gather only two nodes. This is so because t-conorms are associative. The 
following example illustrates this situation. 


Example 5.53. Let n be a node in H defined by subnodes nj, as in: n = (ni, n3, 
n3, ..., Mm}. Let Ln be the t-conorm attached to n. Then, this is equivalent 
to having nodes nj = {n1, n2}, n3 = {n3,nz}, ..., n=, = (nj, y Nm}, with 
t-conorms Lar A 
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In the introduction of this section, we said that HDFMs generalize L- 
decomposable fuzzy measures. In fact, the latter correspond to one-level 
HDFMs. 


Definition 5.54. Let u be a Hierarchically Decomposable Fuzzy Measure on 
X with labeled hierarchy L =< H,S,m >; then, if for each x € X, {x} € root, 
we say that u is a one-level HDF M. 


Proposition 5.55. A one level HDFM is a 1-decomposable fuzzy measure 
with L = Dog: 


Definition 5.56. Let u be an HDFM with labeled hierarchy L =< H,S,m >; 
then, u is a two-level HDFM if, for each x € X, it holds that there exists an 
n € H such that x € n and n € root. 


Definition 5.57. Let u be a two-level fuzzy measure; then, u is an additive 
two-level HDFM if Sroot(£, y) = min(1, x + y). 


Proposition 5.58. An additive two-level HDFM is an inter-additive fuzzy 
measure. 


The last result shows that inter-additive fuzzy measures can be seen as a 
generalization of additive two-level HDFMs. In fact, any HDFM with 1,55; = 
+ is an inter-additive measure. 


5.4 Distorted Probabilities 


Another family of fuzzy measures is that of distorted probabilities. They are 
defined in terms of a probability distribution and a function that distorts them. 
We will review this family below and give some results that link such mea- 
sures with the ones presented previously. We start by defining when a fuzzy 
measure is represented by a function and a probability, and, then, restricting 
the function to be strictly increasing, we reach distorted probabilities. 


Definition 5.59. Let f be a real-valued function on [0,1] and let P be a prob- 
ability measure on (X, o(.X)). We say that f and P represent a fuzzy measure 
u on (X, oCX)) if and only if u(A) = f(P(A)) for all A € p(X). 


Definition 5.60. Let f be a real-valued function on [0,1]. We say that f 
is strictly increasing with respect to a probability measure P if and only if 
P(A) « P(B) implies f(P(A)) « f(P(B)). We say that f is nondecreasing 
with respect to a probability measure P if and only if P(A) « P(B) implies 
f(P(A)) < f(P(B)). 


Using the two definitions, we define distorted probabilities as follows. 
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Fig. 5.5. Distorted probabilities: (a) computation of P(A) and P(B) for u = f o P; 
(b) computation of (A) for the measure “at least around 50% of the probability"; 
(c) discrete representation of the distortion function using a weighting vector w 


Definition 5.61. Let u be a fuzzy measure on (X, o(.X)). We say that u is 
a distorted probability if it is represented by a probability distribution P on 
(X, o(.X)) and a function f that is nondecreasing with respect to a probability 
P. 


Figure 5.5 (a) illustrates the computation of the measure u when u = foP 
for two sets A and B such that P(A) € P(B). A few remarks follow with 
respect to the previous definition. 


1. The definition given above uses a nondecreasing function f. Neverthe- 
less, alternative definitions with strictly increasing f are also used. If f 
is strictly increasing, then u = P and j/ = f o P are consistent fuzzy 
measures (according to Definition 5.10). Instead, if f is a nonincreasing 
function, then p is consistent with p’, but p’ is not always consistent with 
u. 

2. For the sake of simplicity, since X is a finite set, a strictly increasing 
function f with respect to P can be regarded as a strictly increasing 
function on [0, 1]. Points other than {P(A)|A € o(.X)) are not relevant in 
our definition, as they are not really used to compute the measure. Note 
that the function f is only applied to P(A) for all A C X. 

3. Distortion functions f can be seen as fuzzy quantifiers. Under this inter- 
pretation, f measures to what extent a given probability P(A) satisfies 
the quantifier. So, in the case of f being a fuzzy quantifier Q, the measure 
Qo P stands for “P(A) is Q.” For example, if we consider the quanti- 
fier, ^at least around 5096," we have that the fuzzy measure induced by 
a probability O and the quantifier Q defined by u = Qo P stands for the 
measure “at least around 50% of the probability." So, for all sets A C X 
with P(A) € 0.4, we will have u(A) = 0, and for all P(A) > 0.6, we will 
have (A) = 1. Besides, for those A C X with 0.4 < P(A) < 0.6, we will 
have a measure between 0 and 1. Figure 5.5 (b) illustrates this case for a 
set A such that 0.4 « P(A) « 0.6. 

4. For some applications, it is useful to give a discrete representation of the 
function in terms of a weighting vector w = (wi,...,wy) (ae, wi > 0 
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and `; w; = 1). In this case, the function f can be interpolated from the 
set (G/N, 5^; ; Wj }i={0,.. ÊN}. This is represented in Figure 5.5 (c) for 
N — 5. This will be studied in more detail in Section 6.1.3. 


All fuzzy measures can be represented in terms of a probability distribution 
and a real-valued function (see Theorem 5.62 below). Nevertheless, not all 
measures are distorted probabilities. This is so because, for most measures p, 
none of the pairs (f, P) that can represent the measure p includes a strictly 
increasing function f. In fact, the number of fuzzy measures that are distorted 
probabilities is rather small in comparison with the total number of fuzzy 
measures. 


Theorem 5.62. For every fuzzy measure u on (X, o(.X)), there exist a poly- 
nomial f and probability P on (X, (.X)) such that u= f o P. 


To compare the number of unconstrained fuzzy measures and of distorted 
probabilities, we classify the measures into sets of consistent fuzzy measures 
(following Definition 5.10). Then, the following example, illustrates the num- 
ber of such sets for both types of measures for a set X with 3 elements. 


Example 5.63. Let X = {1,2,3}, and let u such that u((1)) < u((2]) < 
({3}). Then, 


a) when p is a distorted probability, either one of the following holds: 
BO) < u((1]) < n((2]) € ((3)) € u1, 2}) € HCL, 3}) € n((2,3)) € nCX) 
HO) < n((1)) < M24) < MCL, 23) € u({3}) € HCL, 3}) € u({2,3}) € U(X); 
b) when p is an unconstrained fuzzy measure, one of the following holds: 





0 < w({1}) < n((29) < u1, 2}) < n((3)) < HC, 3h) < n((2,39) € (X) 
0 < u((19) < n((29) < n((39) < u1, 2}) < HCL, 3h) < n((2,39) < aX) 
0 < (019) < n((29) < n((39) < HCL 3}) < u({1,2}) < n((2,3)) < aX) 
0 < (013) < n((29) < n((39) < HL 3}) < u({2,3}) < n(1529) < aX) 
0 < u((13) < n((29) < wh 2}) < u3} < #2, 3h) < n(1539) < aX) 
0 < m({1}) < n((29) < n((39) < u1, 2}) < n((2,39) < n(1539) < aX) 
0 < (013) < n((29) < n((39) < u2, 3}) < u({1,2}) < a({1,3}) < aX) 
0 < (013) < n((2)) < n((39) < n((2,3)) < HCL, 3h) < n(1529) < aX) 


Thus, among the eight sets obtained for unconstrained fuzzy measures, 
only two are also distorted probabilities. Therefore, there are six types/sets of 
fuzzy measures that cannot be represented as distorted probabilities. 


Table 5.8 compares the sets of consistent measures for different cardinali- 
ties of X and for both types of measures. It can be observed that the larger 
the set X, the larger the gap between the two sets. Thus, the number of 
different distorted probabilities (with respect to consistence) is rather small 
with respect to the number of unconstrained fuzzy measures. m-dimensional 
distorted probabilities have been defined to fill this gap. 
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|X||Distorted Probabilities|Unconstrained Fuzzy Measures 





14 
546 
215470 
Table 5.8. Number of nonempty consistent sets for both distorted probabilities 
and unconstrained fuzzy measures when p({1}) < p({2}) € .... Here, O(10') is 


an estimate 
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Fig. 5.6. Graphical interpretation of the fuzzy measure in Example 5.65 as a two- 
dimensional distorted probability 


5.4.1 m-Dimensional Distorted Probabilities 


In a way similar to k-order additive fuzzy measures, those with k — 1,...,|X| 
cover the whole set of measures; m-dimensional distorted probabilities also 
cover this set with different k. We define them below. 


Definition 5.64. Let P = (X1, X2,--- , Xm} be a partition of X; then, we 
say that u is an at most m-dimensional distorted probability if there exists a 
function f on |0, 1|" and probabilities P; on (Xi, p(.Xi)) such that 


L(A) = f((An X1), P(An X3), , Pa (AN Xm)), (5.12) 


where f is strictly increasing with respect to the ith axis for alli = 1,2,...,m. 

We say that an at most m-dimensional distorted probability is an m- 
dimensional distorted probability if u is not an at most m — 1 dimensional 
one. 


As with the case of k-order additive fuzzy measures, we have that any fuzzy 
measure can be represented as a m-dimensional distorted probability with an 
appropriate value of m. Therefore, if [DP] is the set of all m-dimensional 
distorted probabilities on X, {[DP]* }meq,...|X1} is a partition of the set of 
all fuzzy measures on X. 

We now reconsider Example 5.12 in the light of distorted probabilities. 


Example 5.65. The fuzzy measure on X := {M, L, P} given in Example 5.12 
(and outlined in Table 5.2) is a two-dimensional distorted probability. For 
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Fig. 5.7. The function f in Example 5.65. Note that only relevant values with 
respect to the probabilities P; and P5 are displayed 


building the distorted probability, we need to consider two sets. One set cor- 
responds to the science subjects (M, P} and the other corresponds to the 
literary subject {L}. A graphical interpretation of the measure is given in Fig- 
ure 5.6. In this figure, each axis represents a partition element. Therefore, one 
axis corresponds to the set {L} and the other to the science subjects { M, P}. 
The values of the measure are also represented in the figure: for each pair of 
disjoint sets (A, B), we have the value of the measure for AU B. For example, 
the value for {L} x (M, P} corresponds to u({L} U (M, PY) = p(X) — 1. It 
can be seen that the measure is increasing in both axes. Using the probabili- 
ties P, on the set {L} and P; on the set (M, P), defined as P;({Z}) = 1 and 
P((M]) = Po({P}) = 0.5, and using the distortion function 


f(x, y) :9 —0.82? + 0.42y — 0.2yz? + 1.32 + 0.3y, 


we have (A) = f(Pi(AN {L}), (AN (M, P})). Note that the function f 
is strictly increasing with respect to probabilities P, and P5, but not strictly 
increasing in all [0, 1] x [0, 1]. A graphical representation of the function f is 
given in Figure 5.7. Only the relevant values with respect to the probabilities 
P, and P, are given in the figure. 





No relation has been established between m-dimensional distorted prob- 
abilities and k-order additive fuzzy measures. In fact, the space of measures 
is different, and, thus, in some situations the use of a k-order additive fuzzy 
measure is preferable (as it gives a more compact representation), while in 
some other situations an m-dimensional one is preferable. We show below an 
example of an | X |-order additive fuzzy measure that can be easily represented 
as a two-dimensional distorted probability. 


Example 5.66. Let us consider the distorted probability up, over X = {21, 
£2, 23, 24, t5) generated by the probability distribution p = (0.2, 0.3, 0.1, 0.2, 
0.1), and a distortion function generated from the weighting vector w — (0.1, 
0.2, 0.4, 0.2, 0.1) (see remark 4 after Definition 5.61). The measure for all 
subsets of X, as well as the Mobius transform of this measure, are given in 
Table 5.9 (column jp, v). 

As the Möbius transform is different from 0 for all subsets of X, this 
means that Lp,w is a 5-order additive fuzzy measure. That is, there is no 
k-order additive fuzzy measure for k < 5 equivalent to Up.w- 
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X = (21,22, 23, 24, 95] ; Möbius transform 
{00000} 0.0 0.0 





{00001} 0.04296875 0.04296875 
{00010} 0.1375 0.1375 
{00011} 0.2375 0.05703125 
{00100} 0.06909722 0.06909722 
{00101} 0.1375 0.02543403 
{00110} 0.3 0.09340278 
{00111} 0.5 0.07456597 
{01000} 0.18333333 0.18333333 
{01001} 0.3 0.07369792 
{01010} 0.61666666 0.29583333 
{01011} 0.7625 -0.0278646 
{01100} 0.38333333 0.13090278 
{01101} 0.61666666 0.09123264 
{01110} 0.81666666 -0.0934027 
{01111} 0.9 -0.2537327 
{10000} 0.1 0.1 
{10001} 0.18333333 0.04036458 
{10010} 0.38333333 0.14583333 
{10011} 0.61666666 0.09296875 
{10100} 0.2375 0.06840278 
{10101} 0.38333333 0.03706597 
{10110} 0.70000000 0.08576389 
{10111} 0.81666666 -0.2537326 
{11000} 0.5 0.21666667 
{11001} 0.7 0.04296875 
{11010} 0.8625 -0.2166667 
{11011} 0.93090277 -0.2537326 
{11100} 0.7625 -0.0059028 
{11101} 0.8625 -0.2537326 
{11110} 0.95703125 -0.2537326 
{11111} 1.0 0.50746528 
Table 5.9. Fuzzy measure up,w and its Mobius transform. The first column denotes 
the subsets of X = {#1,..., 25} (a 0 in the ith column means that x; is not included, 


while 1 in the ith column means that x; is included) 


5.4.2 Properties 


A few properties have been proved that establish relationships between this 
family of measures and some other families. We review them below. 


Proposition 5.67. Any fuzzy measure decomposable by means of a continu- 
ous Archimedean t-conorm is a distorted probability. 


Proof. Let u be the decomposable fuzzy measure, and let L be the continu- 
ous Archimedean t-conorm with generator g. Then, l(x,y) can be expressed 
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according to Theorem 2.49 as g} (g(a) + g(y)) for the strictly increasing 
function g. Now, u can be expressed as f o P, considering (i) the probabilities 
pi = qi/K, where qi = g(u({ai})) and K = >), cx qi; and (ii) the distortion 
function f(x) = f'(x x K), where f'(x) = gO» (x). 


Corollary 5.68. Any Sugeno A-measure is a distorted probability. 


Proof. For a Sugeno A-measure with \ = 0, the distorted probability is defined 
with f(x) = x and p; = u((xi]). In the general case, with A 4 0, we have (i) 
f(a) = (e? G9 — 1)/X and (ii) p; = In(1 + Au((z;]))/In(1 + A) 


'This function f is used in Definition 7.27 to define Sugeno A-quantifiers. 


Proposition 5.69. Any fuzzy measure that is decomposable by means of the 
t-conorm L = maximum is a distorted probability. 


Additionally, it is easy to show that, in general, distorted probabilities with 
nonincreasing functions are not decomposable fuzzy measures. The following 
example illustrates this situation. 


Example 5.70. Let u be a fuzzy measure on X = {a,b,c} defined as follows: 


(0) = 0, w({a}) = 0, u({b}) = 0, u({c}) = 0 
y({a, b}) = 0.2, u({a, c)) = 0.4, u({b, c}) = 0.4, w({a,b, ch) = 1 
This measure is a distorted probability. Note that, with the probability 


distribution p(a) = 0.2, p(b) = 0.35, and p(c) = 0.45, and with the function f 
defined below, we have that u can be represented by f and p. 


0 ifz«0.5 
f@)= 0.2 if 0.5 € x « 0.6 

0.4 if 0.6 < x < 0.85 

1.0 if0.85 € z € 1.0 


This function is represented in Figure 5.8. This measure is not a l- 
decomposable fuzzy measure because there is no t-conorm such that L (0,0) 4 
0. Note that, as u({a,b}) = 0.2 when u((a]) = 0 and u((b]) = 0, we would 
require 0.2 = u((a, b}) = L(u({a}), w({})) = (0,0). 


Distorted probabilities and m-dimensional distorted probabilities gener- 
alize, respectively, the symmetric and m-symmetric fuzzy measures. This is 
established below. We start by considering the case of symmetric fuzzy mea- 
sures. 


Definition 5.71. Let u be a fuzzy measure on X. Then, a fuzzy measure is 
said to be symmetric when the measure of a subset of X depends only on 
the cardinality of the set, and not on the elements in the set. That is, for all 
A, B € X, it holds that 


if |A| = |B] then (A) = p(B). 
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Fig. 5.8. Distortion function for Example 5.70. 


Note that, if we consider a distorted probability 4 on X generated from f, 
and P such that p; = 1/|X|, then u is a symmetric fuzzy measure. Moreover, 
if the distortion function f is described in terms of a weighting vector (see 
remark 4 after Definition 5.61), then u(A) = x wj is a symmetric fuzzy 
measure. This is formalized in the next proposition. 


Proposition 5.72. Let u be a distorted probability om X represented by f o 
P such that p; = 1/|X|; then, p is a symmetric fuzzy measure. Let f be 
represented in terms of a weighting vector w = (w1,...,wy) (i.e., wi > 0 and 
Yu wi = 1); then, u is a symmetric fuzzy measure. 


Proof. Using the approach described in remark 4 after Definition 5.61, we will 
have the distortion function as defined by interpolation of the points in the 


set {(i/N, > j=1,....i Wi) }i=fo,....w}- Therefore, u(A) = 225 wj. 


Now, we turn into m-symmetric fuzzy measures. These measures rely on 
sets of indifference. Roughly, a set A is of indifference if all its elements are 
indistinguishable with respect to the measure (i.e., if we can replace any ele- 
ment of A with another element of A and the measure does not change). This 
concept is formalized below. 


Definition 5.73. Given a subset A of X, we say that A is a set of indifference 
if and only if 
VB3, B2 C A,|Bi| = | B2] 


VCCX \A p(B, UC) = p(B, UC) 


We now consider m-symmetric fuzzy measures for the particular case of 
m = 2; then, we give the general definition. 


Definition 5.74. Given a fuzzy measure u, we say that u is an at most 2- 
symmetric fuzzy measure if and only if there exists a partition of the universal 
set P = (Xi, X2}, with X1, X2 Æ 0 such that both X1 and X» are sets of 
indifference. An at most 2-symmetric fuzzy measure is a 2-symmetric one if 
X is not a set of indifference. 
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Definition 5.75. Given a fuzzy measure u, we say that u is an at most m- 
symmetric fuzzy measure if and only if there exists a partition of the universal 
set (X4,..., Xm}, with X1,...,Xm z Ü such that X4,... Xm are sets of in- 
difference. 


We then say that u is an m-symmetric fuzzy measure when it is at most 
m-symmetric but not (m — 1)-symmetric. 


Proposition 5.76. Let u be an m-symmetric fuzzy measure with respect to a 
partition (23,..., £m}. Then, u is a m-dimensional distorted probability. 


5.5 Bibliographical Notes 


1. Fuzzy measures: The history of measure theory is described in [315]. 
For a state-of-the-art description of this field, see the Handbook of Measure 
Theory edited by Pap [312] and the collection by Fremlin on Measure 
Theory [137]. See also [118, 216, 331]. 

The concept denoted in this chapter by the term “fuzzy measure" is used 
in several areas with different names. In particular, the names capacities, 
monotone measures, motone games, and premeasures are common. 

Fuzzy measures were introduced by Sugeno in 1972 [382] in Japanese (in 
1974 [384] in English). Fuzzy measures are studied and described in sev- 
eral works. For a general reference books see the monograph by Wang and 
Klir [427]. For à more specialized book see the edited text by Grabisch, 
Murofushi, and Sugeno [174]. See also the book by Sugeno and Muro- 
fushi [385] (in Japanese). Narukawa in [287] and Radojevic in [330] (see 
also [329]) proved, independently, that all fuzzy measures can be written 
as a weighted mean of 0-1 fuzzy measures. j-inter-additive partitions and 
inter-additive fuzzy measures were introduced in [283] (see also [282]). 

Capacities were studied by Choquet in [80]. The notion of capacity arose 
in the problem of electric distribution. Capacities have been studied by 
several mathematicians. For example, [422] surveyed the notion of capac- 
ities before 1937. [62] has studied the capacities of compact sets (finite 
sets imply compact sets). 

Monotone games were considered by Aumann and Shapley [27]. The 
term premeasures was used by Sipos [364]. Some old references dealing 
with fuzzy measures are listed in [263, 264]. 

0-1 fuzzy measures correspond to coalitions or simple games [239, 285]. 
m-Quota games, when bounded by 1, correspond to probabilities. Sym- 
metric games were considered in 1953 (see [239], p. 212). They correspond, 
when bounded by 1, to symmetric fuzzy measures. 

References for some particular families of measures are described below. 
See the notes in Chapter 2 on probability measures. Kolmogorov axioms 
are given in [215]. 
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2. Interpretations of fuzzy measures: Several interpretations for fuzzy 
measures are briefly given in [279]. u(A) as a grade is described by Sugeno 
in [384]. u(A) as a degree of importance is common in some papers related 
to aggregation (see [254], [321]). This meaning was already used for weights 
in other aggregation operators in [78]. In particular, this work refers to 
weights (p. 252) as the power of opinion that is a degree of importance (or 
certainty, or competence) of an opinion. The related definition of u as the 
power of A to make the decision alone was given in [254] (see Section 2.1, 
p. 626): (A) “can be interpreted as the weight of the degree of importance 
of the combination A of criteria, or better, its power to make the decision 
alone (without the remaining criteria)." 

Interpretations based on probabilities have been studied by several au- 
thors, mainly in the setting of belief functions [367]. Halpern and Fagin 
in [120] give a detailed account of such interpretations. Belief functions as 
inner measures were considered by Dempster in 1967 [94]. The relation- 
ship between probability intervals and fuzzy measures has been studied 
by several authors. Dempster, in [94], considered the class of probabilities 
compatible with belief functions. [220] gives examples of lower envelopes 
that are not belief functions. The interpretation of belief as a probability 
that has suffered from information loss is given in [392]. N-dimensional 
distorted probabilities, defined in [293], permits us to show that all fuzzy 
measures can be interpreted in terms of a set of probability distributions 
and a distortion function. The interpretation of fuzzy measures in terms 
of a mapping between spaces was given by Murofushi and Sugeno in [279]. 

Other interpretations not mentioned here include the Transferable Be- 
lief Model by Smets [368]. 

3. k-order additive fuzzy measures: k-order additive fuzzy measures 
were proposed in [168]. See [169] for additional results. The concept of k- 
order additive was generalized by Mesiar in [261, 262]. The generalization 
permits us to consider k-maxitive fuzzy measures. 

4. Some general aspects: Example 5.12 is based on [167]. The difficulties 
for defining fuzzy measures (requiring 2!*! values, and checking mono- 
tonicity for |X|! different monotonic sequences) was already considered 
by Sugeno in [384] (p. 13). The problem is solved by defining the Sugeno 
A-measures. 

For the Möbius transform, see [335]. Generalizations of the Möbius 
transform are given in [261, 262] using operators other than t-norms. 
Definition 5.19 follows [70]. The term convexity was used in [358] 

5. Belief and plausibility measures: They originated in evidence the- 
ory, and were originally proposed by Dempster [94] and developed by 
Shafer [355]. 

6. Possibility and necessity measures: Shafer [355] and Zadeh [460] 
introduced them in the context of fuzzy sets. 

7. Decomposable fuzzy measures: Weber [430] introduced decompos- 
able fuzzy measures in 1984. He uses infinite decomposability. In [431], 
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references of previous results (with finite and infinite decomposability) 
are given. For example, Barnard [33] and Dubois and Prade [110] con- 
sider the finite case. 

In the case where a measure can be determined from a mapping v : 
X — [0,1], v is known as the density of the measure. 

A-measures were introduced by Sugeno in 1973 [383] in Japanese (in 
1974 [384] in English). [110] shows that Sugeno A-measures are decom- 
posable fuzzy measures. Fung and Ku [158] used, also in 1973, a similar 
measure, but with A = —1. Proposition 5.40 was proved in [226]. See 
also [388]. 

Hierarchically S-Decomposable Fuzzy Measures were defined in [398]. 


. Distorted probabilities: Based on results in experimental psychol- 


ogy around 1948 [326], Edwards defined distorted probabilities (see [115] 
and [116]) in 1953. Descriptive models using distorted probabilities have 
been studied in economics. For example, Handa [121] in 1977 and Kahne- 
man and Tversky [206] in 1979 (to develop Prospect theory) used distorted 
probabilities. In this framework, distortion functions are known as weight- 
ing functions (see [325] and [421]). Aumann and Shapley [27] used them 
in game theory. Distorted probabilities with respect to aggregation have 
been studied in [189, 190, 293]. [189] and [293] study the proportion of 
fuzzy measures that are distorted probabilities with respect to the total, 
the computations leading to Table 5.8. 

[293] gives a representation theorem for distorted probabilities when the 
distortion function is a strictly increasing polynomial. [69] gives necessary 
and sufficient conditions for the existence of a nondecreasing distortion 
function. [69] uses the results by Fishburn in [142]. Instead, [293] is based 
on the results in [351]. 

Distorted probabilities are equivalent to the Q-p-decomposable fuzzy 
measures introduced in [397] to establish that the WOWA operator is a 
particular case of the Choquet integral. 

m-dimensional distorted probabilities were introduced in [293]. m- 
symmetric fuzzy measures were defined in [269] and [270]. The proof that 
m-symmetric fuzzy measures are a particular case of m-dimensional dis- 
torted probabilities is in [291]. 


. Other fuzzy measures: There exist other families of fuzzy mea- 


sures, and some generalizations of fuzzy measures. k-intolerant fuzzy mea- 
sures [250] are an example of a family of fuzzy measures. For an example 
of generalization, see the nonmonotonic fuzzy measures. Introduced by 
Murofushi, Sugeno, and Machida in [284], they are fuzzy measures where 
the monotonicity condition has been dropped. A nonmonotonic fuzzy mea- 
sure can be represented as the substraction of two monotonic fuzzy mea- 
sures. Formally, a nonmonotonic fuzzy measure is a set function u with 
(0) = 0. For other measures, see the survey in [170]. 

Aggregation of measures: An interesting topic not discussed in this 
book is the aggregation and combination of belief functions. This has been 
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studied by several authors. Dempster's Rule of Combination is the most 
widely known combination method. Different methods have been proposed 
on the basis of different assumptions and different interpretations of be- 
lief functions. Halpern and Fagin [120] (see also [119]) differentiate be- 
tween updating (when belief is understood as generalized probabilities) 
and combination (when belief is understood as evidence). Dempster's Rule 
of Combination should be restricted to the latter case according to Halpern 
and Fagin, and other rules should be applied for generalized probabilities. 
Chateauneuf [68] has studied the combination of beliefs when they are un- 
derstood as probability intervals. Chateauneuf also proves that with, the 
interpretation of beliefs as intervals of probability, the results of Demp- 
ster's Rule of Combination are not consistent. 


6 
From the Weighted Mean to Fuzzy Integrals 


Chiri mo tumoreba yamatonaru! 


Japanese saying 


In this chapter we review some aggregation operators for numerical informa- 
tion. While in Chapter 4 description was centered on functional equations, and 
operators were introduced as a natural consequence of some basic properties 
(unanimity, positive homogeneity, and so on), here, operators are introduced 
for greater modeling capabilities and generality. This progression into general 
aggregation operators leads to a review of operators that are particular cases 
of Choquet and Sugeno integrals. On the one hand, the Choquet integral 
generalizes not only arithmetic mean and weighted mean (the most widely 
used and well-known aggregation operators), but also OWA operators. On 
the other hand, the Sugeno integral generalizes weighted minimum, weighted 
maximum, and median operators. In the rest of this chapter we will use Cho- 
quet integral family to refer to aggregation operators that are generalized by 
the Choquet integral. In the same way, the Sugeno integral family will refer 
to aggregation operators that the Sugeno integral generalizes. 


6.1 Weighted Means, OWA, and WOWA Operators 


The simplest and most widely used aggregation operators are the arithmetic 
mean and the weighted mean. Recently (1988), Yager introduced another 
function, the OWA operator, to model aggregation in intelligent systems. The 
definition of the three functions is given below. As such definitions require a 
weighting vector, the section starts by recalling the definition of a weighting 
vector. All definitions in this chapter, unless stated otherwise, assume the N 
values a1,...,aw to be fused. 


! Many a little makes a mickle 
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Definition 6.1. A vector v = (v)...un) is a weighting vector of dimension N 
if and only if vj € [0,1] and $7; vi — 1. 


Definition 6.2. A mapping AM: RN — R is an arithmetic mean of dimen- 
sion N if AM(ai,...,aN) = (1/N) DY, a. 


Definition 6.3. Let p be a weighting vector of dimension N ; then, a mapping 
WM: RX — R is a weighted mean of dimension N if W Mp(ai,...,aN) = 


N 
2 i-i Didi. 


Definition 6.4. Let w be a weighting vector of dimension N; then, a map- 
ping OWA: RN — R is an Ordered Weighting Averaging (OWA) operator of 
dimension N if 


N 
OW Aw(a1,...,aN) = 5 Wias(i); 
i=1 


where {a(1), ...,o(N)} is a permutation of (1,..., N} such that as(i—1) 2 (i) 
for all i = {2,..., N} (ie., acq) is the ith largest element in the collection 
Ql,» an). 


We consider below two situations that can be modeled, respectively, with 
the weighted mean and the OWA operator. 


Example 6.5. A university exam on algebra consists on three exercises. Each 
exercise is evaluated in the [0,10] interval. The final rating of each student 
is obtained as a weighted linear combination of his or her three marks. The 
weights are assigned as follows: 0.5 for the first exercise; 0.25 for the second 
exercise; and 0.25 for the third exercise. Therefore, a student with 8 points for 
the first exercise, 6 for the second and 10 for the third will be rated as follows: 
8-0.5+6-0.25+10-0.25 = 8. This process is modeled using a weighted mean, 
with weights, p = (0.5, 0.25, 0.25). 


Example 6.6. In the Olympic Games, the final rating for a participant in some 
sports is computed from the rates given by the judges. This final rating is 
the average of the rates of the judges once the largest and smallest ones 
are disregarded. This decision making process can be modeled by means of an 
OWA operator. In the case of five judges, we will use OWA with the weighting 
vector w = (w1, ..., w5) = (0, 1/3, 1/3, 1/3, 0). 


6.1.1 Properties 


The OWA operator, as well as the weighted mean, give a value that is between 
the minimum and the maximum of the values to be fused. However, while the 
OWA can model the minimum and the maximum, the weighted mean cannot. 
Instead, the weighted mean can be used to model dictatorship (the value of 
one of the sources is always selected) while OWA cannot. These situations are 
modeled with the following weighting vectors. 
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OWA equal to the minimum: 


OW Ay (a1,...,aN) = min(a1,...,aN) when w = (0, 0, ..., 0, 1) 
OWA equal to the maximum: 
OW A«(a1, ..., aN) = max(ai, ..., ay) when w = (1, 0, ..., 0, 0) 


Dictatorship for the ith information source: 
WMyp(ai,...,aw) = ai when p; = 1 and pj = 0 for all j # i 


An additional difference between OWA and weighted mean is that, in the 
former, the order of the values a; is not relevant (order does not affect the 
result), while in the latter, permutations of the arguments lead to different 
results. This is because the outcome of the permutation ø in the OWA operator 
is independent of the information sources. Therefore, OWA is a symmetric 
operator, while weighted mean is not. OWA is also robust, in the sense that 
it employs all the data minimizing the influence of outliers. 

The OWA operator is an L-estimator (Definition 2.36). That is, it is a 
linear combination of order statistics. OWA generalizes all order statistics. 
Moreover, it is also known that it generalizes, among others, the median (the 
central value of A), the kth minimum, the kth maximum, the arithmetic 
mean, the a-trimmed and the (a, 3)-trimmed means, and the a-winsorized 
and (a, 3)-winsorized means. 

Let us recall some definitions. The ith order statistic (see Definition 2.37), 
is denoted by OS;. Then, the kth maximum is equivalent to OS y. 4,41, and 
the kth minimum is equivalent to OS;. 


Definition 6.7. A mapping M: R^ — R is a median of dimension N if 
Os (N/2) - 9o (N/241) 
2 


when Nis even 
A,( N41) when Nis odd. 


Mind = | 


where o is defined as above. Note that when N is odd, M = OS(n+1)/2- 


The (r, s)-trimmed mean is the mean of values a4,..., aw once the r low- 
est values and the s highest ones are removed. That is, (ag(4.,41) + +++ + 
ag(N—s))/ UN — r — s). The (r, s)-winsorized means is the arithmetic mean 
when the omitted values are replaced by the nearest value to be retained 
unchanged. That is, 


T Qg(r+1) t Oc (r--1) Tean Qg(N—s) +s: To(N—s) 


a-trimmed and o-winsorized corresponds to the previous cases when r — s, 
and when 2a is the proportion of the values being omitted. Thus, aN is the 
number of values to be trimmed at each end. 

For N = 2, the OWA and the Hurwicz operator (H(x) = o max(z) + 
(1— ce) min(a)) are equivalent, while for N > 2, the Hurwicz operator is a 
particular case of OWA. 
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Order statistics, kth maximum, kth minimum, trimmed and winsorized 
means, and median (except for even N) are operators based on the ordering 
among values a;. According to this, they can be used in ordinal scales. For 
example, in the case of OS;, the element that occupies the ith position is 
selected. In fact, these operators are methods for element selection, and they 
proceed by considering the input data as a multiset or bag. Therefore, all them 
satisfy symmetry. Some properties (and characterizations) of these operators 
were given in Section 4.3.1, devoted to ordinal scales. 


6.1.2 Interpretation of Weighting Vectors in WM and OWA 


From the definitions above, it can be observed that the weighted mean and 
the OWA operator have similar expressions: both are a linear combination of 
values with respect to a weighting vector. However, in spite of their similarity, 
the meaning of the weights is radically different due to the presence of the 
(ordering) permutation o in the OWA operator. 

It is well known that, in the weighted mean, weighting vectors are used to 
express the reliability of the information sources that have supplied a partic- 
ular value. That is, p; corresponds to a measure of the reliability of the ith 
sensor or of the expertise of the ith expert. This is not the case with the OWA 
operator, where weights, due to the ordering c, assign importance to elements 
according only to their position with respect to the others. In this way, a sys- 
tem can reduce the importance of extreme values (or even ignore them, as 
in Example 6.6), or give greater importance to small values rather than large 
ones (for example, in the case of a robot that has to avoid collisions). This 
corresponds to weighting the values rather than weighting the sources. 

According to this interpretation, weighting vectors in these two operators 
are complementary (we will refer to them as p and w, as in the definitions 
above), and in some circumstances both are of interest in a single application. 

We consider below four scenarios where aggregation operators can be used. 
'These scenarios are later used to illustrate the meaning of the weighting vec- 
tors for the weighted mean and OWA operators. 


1. Multicriteria decision making: Several alternatives are considered, and 
one of them has to be selected (for example, we want to buy a car, and 
several brands are considered). Several criteria evaluate each alternative 
(e.g., comfort, price, security equipment) in the [0,1] interval (1 being 
adequate, 0 being inadequate). To select the best alternative, an overall 
rating is computed for each one. This rating is an average of the criteria. 

2. Fuzzy Constraint Satisfaction Problems: The optimal solution of a prob- 
lem has to sastisfy some constraints. For a given possible solution, con- 
straints can be tested, and their evaluation returns a value in the unit 
interval (0 when a constraint is not satisfied at all, 1 when it is completely 
satisfied). To have an overall rating of the solution, the evaluations of all 
constraints are aggregated. 
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3. Robot sensing (all data corresponding to the same time instant): A robot 
receives the readings of five sensors, each measuring distance to the nearest 
object. 'To avoid collisions, the robot estimates the distance of the nearest 
object by means of a fusion of the readings. 

4. Robot sensing (data obtained at different time instants): A case similar to 
the previous one, with the same goals, but with the robot only having a 
single sensor. In this case, in order to estimate the distance to the nearest 
object, the robot fuses the last reading with some previous ones. 


We consider now the use of weighted mean (WM) and OWA operators for 
computing the rating or for fusing the sensor readings in the examples above. 
'Then, we interpret the weighting vectors p and w in the operators. Recall 
that we use p to denote WM weights and w to denote OWA weights. This 
difference is semantical, because both weights have the same structure (they 
follow Definition 6.1). 


1. Multicriteria Decision Making: The weighting vector p corresponds to the 
importance of the criteria (for example, we give more importance to price 
than to comfort), while w corresponds to the degree of compensation 
allowed among criteria. Large compensation (the selection of the largest 
values) corresponds to evaluating an alternative as good (say) when at 
least one criteria evaluates it as good. In contrast, no compensation (the 
selection of the smallest value) corresponds to assigning a low score to an 
alternative when at least one criteria is badly rated. 

2. Fuzzy Constraint Satisfaction Problems: In this case, p corresponds to 
the importance of each constraint, while w corresponds to the degree of 
compensation between constraints, that is, the degree to which a bad 
evaluation of a constraint implies a bad evaluation of the solution, or how 
restrictions have to be addressed so that the solution is considered good. 

3. Robot sensing (all data corresponding to the same time instant): In this 
case, p would be used to express the reliability of each sensor, while w 
would be used to determine degree to which small values are important 
(to avoid collision), independently of the reliability of the sensors. Also, 
w can be used to prevent the influence of outliers. 

4. Robot sensing (data obtained at different time instants): In this case, p 
would be used to give more importance to recent data than old data, while 
w would be used, as in the previous example, to express the importance 
of small values or to diminish the influence of outliers. 


Let us now give an example corresponding to the fuzzy constraint satis- 
faction problem. 


Example 6.8. Let us consider two professors A and B who have to teach a 
course consisting of a tutorial and a training part. A number of fuzzy con- 
straints apply to the number of sessions of the course, the number of sessions 
given by the professors, and so on. Such constraints are listed below: 


152 6 From WM to Fuzzy Integrals 


'The total number of sessions is six. 

Professor A will give the tutorial, which should consist of about three 
sessions; three is the optimal number of sessions; a difference in the number 
of sessions greater than two is unacceptable. 

Professor B will give the training part, consisting of about two sessions. 
Both professors should give more or less the same number of sessions. A 
difference of one or two is half acceptable; a difference of three is unac- 
ceptable. 


'The constraints of this problem can be described using fuzzy sets. To do 
so, we need to define the variables, the fuzzy sets (to describe the constraints 
for the variables), and the constraints. We start defining the variables. Two 
variables are considered: 


e r4: Number of sessions taught by Professor A 
e cp: Number of sessions taught by Professor B 


With these variables, the four constraints above are translated into 


Cı: £A + £p should be about 6 
C2: x, should be about 3 

C3: xp should be about 2 

C4: |r4 — zp| should be about 0 


Using fuzzy sets, we can evaluate to what extent any constraint is satisfied. 
For example, if we have ue to express “about 6,” then we can evaluate *rA4J-rpg 
should be about 6" by ug(rA4 + xg). So, given pe, H3, H2, and uo, we can 
compute to what degree a solution pair (£4, £p) satisfies all constraints. The 
corresponding degrees of satisfaction will be: 


Le(x4 + xp) 
pa (xA) 
dm 


uo(|xA — vnl) 


To completely determine the satisfaction degrees, we need to define the mem- 
bership functions. We use the following membership function for expressing 
the fourth constraint: 


(2—2)/2 if £ x «1 


a= d 05 ifl<a<2 
HONT) N18 x)/2 EDS eS 8 
0 ifr 23 


For the other three constraints, we use the triangular membership func- 
tions represented in Figure 6.1. This corresponds to the following definitions, 
using Definition 2.41 for the membership functions: 


e pelz) = 15,6,7(2) 
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H3 





Fig. 6.1. Membership functions for Example 6.8 


e ua(x) = ps a(x) 
e u»(r)-— Hi 2 ax) 


Let us now consider a few pairs of values (x4, £g) and their satisfaction 





degrees according to the definitions above: 

e (2,2) : (ue(4), us(2), u2(2), uo(0)) = (0,0.5, 1, 1) 

e (2,3) : (ue (5), us(2), H2(3), uo (1)) = (0.5, 0.5, 0.5, 0.5) 

© (2,4) : (us(6). ua(2), pio (4), (2) = (1,0.5,0,0.5) 

e (3.5,2.5) : (u «(6 ), us (3.5), ua (2.5), uo(1)) = (1,0.5,0.5,0.5) 
e (3,2) : (u6(5), u3(3), u2(2), uo (1)) = (0.5, 1, 1,0.5) 

e (3,3) : (ue(6), u3(3), uz (3), #9(0)) = (1, 1,0.5, 1) 


In order to rate the set of alternatives with respect to a global satisfac- 
tion, we can combine the partial degrees of satisfaction using an aggregation 
operator. This is, the satisfaction for a pair of solutions (£4, £g), denoted by 
sat(x4, xp), will be 


sat(xa, £B) = C(ue(rA + zB), ua(zA), uo (zB), uo(|v A — vn|)). 


When no importances are given, we can define C as the arithmetic mean. 
Nevertheless, we might consider that some constraints are more important 
than others. Let us consider the following situation: 


e Professor A is more important than Professor B 

e The number of sessions equal to six is the most important constraint (but 
not a crisp constraint) 

e The difference in the number of sessions taught by the two professors is 
the least important constraint 


We will model this situation by assigning the following weights to the con- 
straints: p = (pi, p2, pa, pa) = (0.5,0.3,0.15,0.05). That is, the first constraint, 
Cı, has weight 0.5, the second constraint, C2, has weight 0.3, C3 has weight 
0.15, and the C4 has weight 0.05. We can model the aggregation using the 
weighted mean. Doing so, we obtain the following evaluation for the previous 
pairs of values: 


? 


e sat(2,2) = WM,(0, 0.5, 1,1) = 0.35 
e sat(2,3) = WM,(0.5, 0.5, 0.5, 0.5) = 
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e sat(2,4) =WM,(1, 0.5, 0, 0.5) = 0.675 

© sat(3.5,2. = e p (1, 0.5,0.5,0.5) = 0.75 
e sat(3,2) =WM,(0. 5, 1,1,0.5) = 0.725 

e sat(3,3) = ey, (1, 1,0.5, 1) = 0.925 


So, with this model which considers the importance of each constraint, the 
best solution is when both professors give the same number of hours, and so 
the total number of sessions is six and the difference in the number of sessions 
is zero. The second best solution is when Professor A gives 3.5 sessions and 
Professor B gives 2.5 sessions, and the third solution is when Professor A gives 
three sessions and Professor B gives two sessions. 

Another matter to be taken into account when considering multiple con- 
straints is compensation. That is, how many values can have a bad evaluation. 
Let us consider the case where one bad value does not matter. In this case, 
we can model the aggregation using an OWA operator (if the importance of 
the constraints are not considered). We show below the results of the OWA 
operator when a weighting vector w = (1/3, 1/3, 1/3,0) is used. This vector 
stands for the lowest value to be discarded (a weight equal to 1), and all the 
others having the same weight. 

With such a weighting vector, the pairs of solutions are evaluated as fol- 
lows: 


e sat(2,2) = OW A,(0, 0.5, 1, 1) = 0.8333 

e sat(2,3) = OWA, (0.5, 0.5, 0.5, 0.5) = 0.5 

e sat(2,4) = OW A,(1, 0.5, 0, 0.5) = 0.6666 

e sat(3.5,2.5) = OWA, (1, 0.5, 0.5, 0.5) = 0.6666 
e sat(3,2) = OW A,(0.5, 1, 1, 0. 5) — 0.8333 

e sat(3,3) = OW A,(1,1,0.5,1) = 1.0 


With this model, the best solution is when both professors give the same num- 
ber of hours. Since in this case there are three constraints that are completely 
satisfied, the satisfaction degree is maximum (equal to 1). Two other solutions 
have a rate of 0.833. They correspond to Professor A giving two sessions and 
Professor B also giving two sessions, and Professor A giving three sessions 
and Professor B giving two sessions. Note that, for the pair (2,2), the first 
constraint is completely discarded in the evaluation because its satisfaction 
degree is the lowest (and equal to 0). 


6.1.3 The WOWA Operator 


The Weighted OWA (WOWA) operator was introduced to model situations in 
which both importance of information sources and importance of values had 
to be taken into account. The operator aggregates a set of values using two 
weighting vectors: one corresponding to the vector p in the weighted mean 
and the other corresponding to w in the OWA operator. The definition of the 
operator is as follows. 
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Definition 6.9. Let p and w be two weighting vectors of dimension N ; then, 
a mapping WOWA: RN — R is a Weighted Ordered Weighted Averaging 
(WOWA) operator of dimension N if 


N 
WOW Ap.w (a1, a an) = WiAg(i)> 
i=l 


where o is defined as in the case of OWA (i.e., as) is the ith largest element 
in the collection a1,...,aw ), and the weight wi is defined as 


Wi = w* (Y pog) EE w* (Y po), 


j<i j<i 
with w* being a nondecreasing function that interpolates the points 


(G/N, S wi) }i=t,....w U ((0,0)). 


j<i 


The function w* is required to be a straight line when the points can be inter- 
polated in this way. 


From now on, we denote by w the (weighting) vector w = (wi ... wy). 
We illustrate this operator by reconsidering Example 6.8. 


Example 6.10. Let us consider again Professors A and B, and the assignment 
of x4 and zp sessions to A and B, respectively. Now, if we want to evaluate the 
different alternatives, modeling at the same time the fact that some constraints 
are more important than others, and that we want some compensation, then 
we can rate each alternative using the WOWA operator. 

Let us consider, as in Example 6.8, that the importance of the constraints 
is represented by the weights p = (pi,pe,pa,p4) = (0.5,0.3,0.15,0.05), 
and that the compensation is represented by the weighting vector w — 
(1/3,1/3,1/3,0). In this case, the WOWA operator permits us to aggregate 
the satisfaction degrees ig(r4 4- xp), ua(zA), y2(£B), and uo(|z4 —xp|). The 
results obtained for the pairs in Example 6.8 are as follows: 


2,3 
2,4) = WOW Ap(1, 0.5, 0, 0.5) = 0.8333 

3.5, 2.5) = WOW A, (1, 0.5, 0.5, 0.5) = 0.8333 
3,2) = WOW A, (0.5, 1, 1, 0.5) = 0.8 

3,3) = WOW A, (1,1, 0.5, 1) = 1.0 


© © © © >ò © 
05 
2e 
c 
Ac 


We can see that the best solution is the pair (3, 2). Thus, this pair is considered 
better when evaluation uses the weighted mean, the OWA operator or the 
WOWA operator. This is not always the case: the relative positions of possible 
solutions might change when using different functions. This is the case with 
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the pair (2, 4), which has a fourth position when the combination function 
is either the weighted mean or the OWA operator, but has a better position 
when we use the WOWA operator. 

This is so because we assign the largest weight to the value u(xA, xp). 
Thus, when this value is high, as it is in the third and the fourth alternatives, 
the final outcome is also high. Besides, in relation to compensation, we have 
that the lowest values have to be discarded. By “lowest values” we mean those 
that represent one fourth of the total with respect to the weights. That is, the 
lowest values in J, such that 55;-; p; are around 0.25, are discarded. In the 
case of the satisfaction degree of (2, 4), we have that in (1,0.5,0,0.5) we can 
disregard the 0 (as it is the lowest value); but, as this value has only a weight 
equal to 0.15, we can also disregard (part of) the value 0.5 (the second lowest 
value). This improves the final evaluation of the pair (2, 4). 


Rationale for WOWA's new weights 


Note that the WOWA operator is also a linear combination of the values with 
respect to a vector (in this case w). When studied from this point of view, 
the operator determines a weight w; for each value a; in terms of the two 
weighting vectors p and w, in such a way that the initial weight p; for a; 
is increased (ie., wi > Po(:)) if the value a; is small, and small values have 
more importance than larger ones (the same holds if the value a; is large and 
importance is given to large values). In contrast, when importance is given to 
large values and a; is small, we have wi < Po(i) (the same holds if importance 
is given to small values and a; is large). Here, small and large should not be 
understood as an absolute term in the domain, but relative to the other values 
in a= (a,..., ax). 

We turn now into the construction of w*. The shape of this function w* 
(specially its derivative) shows the relative importance of the elements. An 
alternative definition for WOWA avoiding such a construction is given in 
Section 6.1.4. The section also contains an example (see Example 6.16) of the 
WOWA operator. 


The construction of the interpolation function w* 


The WOWA operator defines the weighting vector w in terms of differences 
between pairs of points on the function w*. The points are selected using the 
weighting vector p, and the function is built using the vector w. The rationale 
of this construction is described below. 

In order to have WOWA operators generalizing the OWA operator, the 
weights w; are supposed to be equal to w; when all sources have the same 
importance. That is, when all weights p; are equal (i.e., p; = 1/N), wi = wi for 
all i € [0, 1]. From now on, we denote by p? the “same-importance” weighting 
vector p? = (p9,...,pX) = (1/N,...,1/N). Note that, when p = p°, no 
interactions between weights are considered, and thus PS0) = 1/N and wi 
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Fig. 6.2. Building weights w for the WOWA: (a) building w*; (b) extraction of wi 
when po(1) > Poca); (c) extraction of wı when ps1) < D$) 


refer, respectively, to the weight for the information source that supplies the 
largest value in (a1,...,aw] and the weight for the value itself. Similarly, 
Po (2) = 1/N and wy refer, respectively, to the weight for the source supplying 
(2) and the weight for this value. In general, w; and p,(;) are weights referring 
to the value a,,;) and to the source that supplies this value. 

Now, we put the relation between w and p in a graphical form. To do so, 
we consider the set of points: 


(97205, > whet = (poo = 3 VN =i/N, S wi) ia 


ILA ILA j<i j<i j<i 


It is important to underline that differences between y-axis coordinates of two 
consecutive points lead to w;, and that differences between x-axis coordinates 
of two consecutive points lead to the values P i 

As $5; wi = 1 and $5, p; = 1, these points are in the unit interval, and as 
wi > 0 and p? > 0 they shape a monotone function. We define w* as a function 
that is monotone and interpolate such points. Additionally, as required by the 
definition of the WOWA operator, the function should be a straight line when 
the points can be interpolated in this way. Figure 6.2 (a) displays the points 
as well as the function w*. 

Selecting weights w from the curve is like strengthening or narrowing the 
intervals p? on the z-axis. A modification of the p? causes a movement of the 
points over the curve. The values w; are obtained after such a modification of 
the values of p;. 

Let us consider the case of the largest element a,(;) and the weights wj 
and p,(1)- In this case, if pj) > Pa) = 1/N, it is natural that wı > w1. Note 
that, if we increase p;(1) in relation to Po) (i.e, Po(1) = Poa) +a for a > 0), 
then w*(po(1)) > w" (Pa) Naturally, this value, w*(ps(1)), corresponds to 
the weight wı = w"(ps(1)) = w* (psa) +a). This computation is represented 
in Figure 6.2 (b). In a similar way, if we decrease pg), then w*(po(1)) < 
w* (p? (,,). This case is displayed in Figure 6.2 (c). 
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Fig. 6.3. Building weights w for the WOWA: (a) extraction of w2; (b) extraction of 
Wi 


Let us consider the second largest element as(2}. The corresponding p 
weight is p5(2). In this case, the accumulated value Po) + Po (2) is moved to 
Po(1) + Po(2)- Therefore, we compute w* (ps1) + Po(2)), Which corresponds to 
W1 +W2 (see Figure 6.3 (a)), and, thus, w2 = w*(po(1) + Po(2)) — w* (Po(1))- It 
is important to note that the value w2 will depend not only on p;(5), but also 
on ps(1). This is natural, as P,(;) might be zero, and in this case, the largest 
value with no null importance is as(a)- 

The computations of all other w; proceed in a similar way. See Figure 6.3 
(b) for the computation of w;. As for we, the value w; depends not only on 
Ds (ij, but also on all values p,(;) and wj for j < i. Note that this construction 
depends on the permutation c which in turn depends on the values a; to be 
aggregated. Therefore, the weights w; depend on the ordering inferred from 
the values a;, and, thus, given p and w, different sets of values lead to different 
Wi. 


Example 6.11. The weights w = (1/3, 1/3, 1/3, 0), used in Example 6.10, lead 
to the following function w*: 


"us 2/0.75 if x < 0.75 
cure qug if x 0.75. 


The shape of the interpolation function w* 


The construction given above shows that the shape of the function w*, and, if 
applicable, the shape of its derivative, gives information about the values that 
are considered more relevant. In particular, the larger the slope of w*, the 
larger the importance of the corresponding elements. Figures 6.4 (a), (b), and 
(c) correspond, respectively, to the situation of giving importance to large, 
medium, or average, and small values. Figure 6.4 (d) corresponds to the sit- 
uation of equal importance for all values. 
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Fig. 6.4. Function w": (a) largest importance to large values; (b) largest importance 
to medium values; (c) largest importance to medium values; (d) equal importance 
to all values 





Properties 


The WOWA operator generalizes the weighted mean and the OWA operator. 
In particular, note that, when p — (1/N ... 1/N), it reduces to the OWA 
operator: 


WOW A,,w (a1, ...,aN) = OW Aw(a1,..., an) for all w and aj. 
Also, when w — (1/N ... 1/N), the operator reduces to the weighted mean: 
WOW Ayp,w (a1, ...,au) = W My(ai, ..., aw) for all p and a;. 


This implies that the WOWA operator generalizes all the operators gener- 
alized by the weighted mean and the OWA. In particular, when w = p = 
(1/N ... 1/N), we get the arithmetic mean: 


WOW Ap, w (a1, ...,aN) = AM (ai, ..., aw) for all aj. 


6.1.4 OWA and WOWA Operators and Fuzzy Quantifiers 


Alternative definitions for OWA and WOWA operators exist based on fuzzy 
quantifiers (see Section 2.3.4). Definitions use a kind of fuzzy quantifier to ex- 
tract the weights. OWA uses a fuzzy quantifier instead of a vector, and WOWA 
uses a fuzzy quantifier (with interpretation equivalent to that for OWA) and 
a weighting vector (the p vector corresponding to that of the weighted mean) 
'The definitions are given below. The one for the fuzzy quantifier and its gen- 
erator is also included for the sake of completion. 


Definition 6.12. A function Q : [0,1] — [0,1] is a regular nondecreasing 
fuzzy quantifier (nondecreasing fuzzy quantifier for short) if (i) Q(0) = 0; (ii) 
Q(1) = 1; and (iii) x > y implies Q(x) > Q(y). 


Although we will not use it here, the class of regular increasing mono- 
tone (RIM) quantifiers is well-known. The quantifiers satisfy (i), (ii), and the 
following (iii) « > y implies Q(x) > Q(y). Sometimes, they also denote non- 
decreasing quantifiers. 
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The generating function of a quantifier might be useful for studying its 
properties. Formally, given a fuzzy quantifier Q, a function q : [0,1] — [0,1] 
is called the generating function of Q(x) if it satisfies 


Q(x) = I " g(t)at, 


where q(t) > 0 for all t € [0, 1], and fo a t)dt = 1. 


Definition 6.13. Let Q be a regular nondecreasing fuzzy quantifier; then, a 
mapping OW Ag : RN — R is an Ordered Weighting Averaging (OWA) 
operator of dimension N if 
N 
OW Agla, aN) = $ (QG/N) — Q(G — 1)/N) Jast), 
i=1 
where o is defined as before. 


From a practical point of view, the use of fuzzy quantifiers in OWA oper- 
ators is an advantage when the number of elements to be fused is not fixed 
beforehand. That is, the same quantifier can be used to aggregate values re- 
gardless of the number of information sources N. 

This definition is equivalent to the previous one based on a weighting 
vector. This is so because, with w; defined from Q as w; = Q(i/N) — Q((i — 
1)/N), we have OW Ag(ai,..., aw) equal to OW Ay(a1,..., aw). Similarly, 
Q can be defined as a function that interpolates the points ((i/N, Q(i/ N))) 
for i € (0,1,..., N} so that OW Ag and OW Aw are equivalent. 

Note that the function interpolated from ((i/N, Q(i/N))} for i € {0,..., N}, 
and, in general, the function Q, corresponds to the function w* in the WOWA 
definition (Definition 6.9). This makes clear that the function w* displayed 
in Figure 6.4 are regular nondecreasing fuzzy quantifiers that can be used 
with the OWA operator to give, as before, most importance, respectively, to 
large, medium, or small values, or to assign equal importance to all values. 
Having said that, it is obvious that the same construction can be applied to 
the WOWA operator. The corresponding definition for the WOWA operator 
follows. 


Definition 6.14. Let Q be a regular nondecreasing fuzzy quantifier, and let 
p be a weighting vector of dimension N; then, a mapping WOWA: RN — R 
is a Weighted Ordered Weighted Averaging (WOWA) operator of dimension 


N if 
WOW Ap o(a1,...,a -Y aeo (i) 


where o is defined as in the case of the OWA, pue the weight wj is defined as 
E QU paa) m QO poa) 
jii j<i 


Naturally, this definition is equivalent to the WOW Ap.w in Definition 6.9. 
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Fig. 6.5. Representation of the fuzzy quantifier Q» in Example 6.15 


Properties 


Equivalence between OW Aw and OW AQ and between WOW Ap,w and 
WOW Ap,Q makes clear that the properties obtained for OW Ag and WOW Ap,Q 
are analogous to the ones obtained for OW Aw and WOW Ap w. In particular, 
WOWA generalizes the weighted mean (when the quantifier is Q(r) = x) and 
the OW Ag when p = (1/N,...,1/N). Also, OW Ag generalizes the arith- 
metic mean (with Q(x) = zx). 

OWA with the fuzzy quantifier "for all" (recall from Section 2.3.4 that 
this is Q(1) = 1 and Q(x) = 0 for all x Z 1) is equivalent to minimum, and 
OWA with the fuzzy quantifier “there exists" (recall that this is Q(0) = 0 and 
Q(x) = 1 for all x zz 0) is equal to maximum. 

The rationale of this is that minimum defines a lower bound in which all 
sources agree, and maximum defines an upper bound in which only one source 
agrees. For other quantifiers, values are between minimum and maximum. The 
nearer a quantifier is to “there exists” the larger is the output of the OWA 
operator. When the quantifier is “exactly 50%,” the OWA operator is the 
median (the medium value). This is illustrated in the following example. 


Example 6.15. Let us consider the quantifiers Q1, Q2, Q3, and Q4 defined as 
follows (quantifier Q2 is represented in Figure 6.5): 


e Qi (x) = q1/4 


0 xz «1/5 
e Qo(z) =) 3@-§) 35755 
1 z 1/5 
e. Q(x) = xf 
e Q4(r) c 


Note that Qı is a quantifier similar to “there exists,” with a shape similar 
to the one in Figure 6.4 (a); Q2 is a quantifier similar to “exactly 50%,” with 
a shape similar to the one in Figure 6.4 (b); Qs is similar to "for all," with 
a shape similar to the one in Figure 6.4 (c); and, finally, Q4 corresponds to 
equal relevance for all inputs (Figure 6.4 (d)). 

Let an input vector a = {0.5, 0.25, 0.8, 0.0, 0.75}. Then, the OWA operator 
applied to this vector a with respect to the quantifiers defined above is as 
follows: 
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OW Ag, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.6887 
OW Ag, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.5 
OW Ag, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.1412 
OW Ag, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.4600 


The results show that, as expected, the smallest value is obtained with 
Q3, and the largest value is obtained with Qj. The value obtained with Q» is 
a value in between: 


OW Ag, (a) < OW Ag, (a) € OW Ag, (a). 
Also, OW Ag, corresponds to the arithmetic mean of the values in a. Note 
also that OW Ag, (a) < OW Ag, (a) because Q2 does not give much impor- 


tance to extreme values, and, thus, its result corresponds to the average of 
(0.25, 0.5, 0.75). 


Let us consider now the application of the WOWA operator to the same 
vector a, and compare its results with the OWA and weighted mean. 


Example 6.16. Let us consider the quantifiers Qi(x) = a!/4 and Qs(z) = z^, 
as in the previous example, and let pı = (0.1,0.1,0.6,0.1,0.1) and p» = 
(0.1, 0.1, 0.1, 0.6, 0.1) be two weighting vectors. The WOWA operator with pı 
will return a value larger than the one of the OWA operator because ag is 
the largest input, and this is the value with the largest weight. In contrast, 
the WOWA operator with p2 will return a value smaller than the one of the 
OWA operator because a4 is the smallest input. 
This can be observed in the following computations: 


WOW Ap,.Qi «S OW Aq, « WOW Ap,.Qi 
WOW Ap,.Q; < OW Ag, < WOW Ap, ,Q2 


because 


WOW Ap, a, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.7526 
WOW Ap, Q, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.7416 
WOW As, Q, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.5791 
WOW As, Q, (0.5, 0.25, 0.8, 0.0, 0.75) = 0.1250 


Below, we give the w weighting vectors that are used for each computation 
of the WOWA operator. It can be observed that the weights w; in the case of 
computing WOW Ap,,Q, are larger for smaller values and smaller for larger 
values in comparison with the weights used for computing the OW Ag, . 


w for WOW Ap, a, = (0.880, 0.035, 0.031, 0.028, 0.026) 
w for WOW Ap, a, = (0.666, 0.167, 0.167, 0.0, 0.0) 
w for WOW Ap, o, = (0.563, 0.106, 0.071, 0.055, 0.205) 
w for WOW Ap, Q, = (0.0, 0.0, 0.167, 0.167, 0.666) 


When Q(x) = z for all x in [0, 1], the OWA reduces to the weighted mean, 
because the quantifier states that all elements are equally relevant. 
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6.2 Choquet Integral 


Now we consider another tool for aggregation. From a definitional point of 
view, its main difference with the previous tools is its use of fuzzy measures. 
Such measures have been studied in detail in Chapter 5, and some interpre- 
tations were given in Section 5.1.1. 

In the methods seen so far, we have only considered a single weight for each 
data element (except for WOWA). Besides, we have not (explicitly) considered 
how to measure the importance of a set of sources. For example, in the case of 
the multicriteria decision making problem of car selection, we can state that 
the comfort criterion confort has an importance of 0.3, and that price is more 
important, and, thus, its weight is 0.5. However, we have not considered the 
importance of comfort and price when considered together. 

In Chapter 5, we have shown that fuzzy measures are functions defined over 
the parts of a set that satisfy monotonicity (the larger the set, the larger the 
measure) and boundary conditions (the measure of the whole set is 1). These 
restrictions permits us to interpret the measure of a set as the measure of its 
importance: when information sources are added, the importance increases; 
when all sources are considered, their importance is maximum and equals to 
1. Section 5.1.1 considers interpretations of fuzzy measures with respect to 
aggregation. 

Fuzzy measures permit us to incorporate considerations not included in 
the weights for the weighted means and the OWA operator. In particular, 
they can be used to express redundancy, complementariness, and interactions 
among information sources or criteria. Therefore, tools that use fuzzy mea- 
sures to represent background knowledge permit the consideration of sources 
that are not independent. The Choquet integral is one of these tools. Sugeno 
integrals and fuzzy t-conorm integrals (see Sections 6.4 and 6.5) are some 
other examples. 

Below, we give a definition of the Choquet integral together with an equiv- 
alent alternative expression. The Choquet integral is defined as the integral of 
a function f with respect to a fuzzy measure y. In our case, both the function 
and the measure are based on the set of information sources X = ([21,..., xw]. 
The function f : X — R* corresponds to the value that the sources supply 
(i.e., f(zx;) = ai, using, as before, a; to denote the ith input value) and the 
fuzzy measure assigns importances to subsets of X (thus, u : o(.X) — [0, 1]). 


Definition 6.17. Let u be a fuzzy measure on X; then, the Choquet integral 
of a function f : X — Rt with respect to the fuzzy measure u is defined by 


— 


N 
(c) f fn = Y oU ese) - Hesio) (6.1 


where f(x,(;) indicates that the indices have been permuted so that O < 
f(x.) < € f(zw) < 1, and where f(z,(9) = 0 and Asi = 
T: m iai NY)- 
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Fig. 6.6. Interpreting Choquet integral as a weighting of segments 


To denote a Choquet integral, when no confusion exists over the domain 
X, we will use the notation CI, (ai,...,aw) = (C) f fdu, where, f(z;) = ai, 
as before. A Choquet integral can be expressed in an alternative way according 
to the next proposition. 


Proposition 6.18. Let u be a fuzzy measure on X ; then, the Choquet integral 
of a function f : X — R* with respect to u can be expressed as 


N 
©) f fà = Y eooo) = Aci 62) 

or as " 
(c) f fau = 35 Gotha) = ns (6.3) 


where {a(1),...,0(N)} is a permutation of {1,...,N} such that f(zs(1)) = 
f(zsg) 2 +++ = f(ze(w)), where Ac = {xo |j € kj (or, equivalently, 
Ac(k) = [1zo();--- to(x)] when k > 1 and Azo) = 0), and where s and As(i) 
are as in Definition 6.17, with As(N+1) = 


Expressions (6.1) and (6.2) outline different aspects of the Choquet inte- 
gral: 


1. Equation 6.1 shows that, for each segment defined by two consecutive val- 
ues f(r,() and f(#s(;~1)), the Choquet integral weights the length of the 
segment f(«s(i)) — f(@s(i—1)), according to the measure of all the sources 
that supply values greater than or equal to f(x,(;). This is so because 
Asl) = {£5(i)) «s Ts(N)}- Figure 6.6 illustrates this process. In this figure, 
the segment [0, as(1)] is weighted by u(Asa)) = mM({@s(1),---,%scny}) = 
u({zso |j = 1]). Similarly, the segment [as(3),@s(4)] is weighted by 
(Asa) = {zs} 2 4}). This interpretation of the integral is studied 
in more detail in Section 6.2.1. 

2. Equation 6.2 shows that the Choquet integral is a linear combination 
of values in a way similar to the weighted mean or the OWA opera- 
tor. Note that each value f(x,(;)) is combined with the weight v; = 
(u(Acq)) — M(Ac(i-1))), and the weights define a weighting vector. Note 


that 37^ v = 00, (u(Aow) — M(Acu-1)) = n( Aca) — n(As(o)) = 1- 
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Let X be a reference set, let (X, A) be a measurable space, let u be a fuzzy measure 
on (X, A), and let f be a measurable function f : X — [0,1]; then, the Choquet 
integral of f with respect to p is defined by 


C,(f) = 7) T ape Nae, 


where pş (r) = n((alf(z) > r}). 


Fig. 6.7. Choquet integral in a continuous domain 


The use of fuzzy measures implies that the importance of a set is fixed 
beforehand, and, therefore, it is not influenced by the values supplied by the 
sources. That is, (A) does not depend on the actual values of f(x) for x € A. 
However, the Choquet integral is influenced by the values, and the weight v; 
computed for a certain value f({a,(;)}) depends on the source z;(;; and on 
the sources z5(1),...,Xo(j-1) because, as shown in the second point above, 
vU — U(As()) = B (As (i1): 

Another equivalent expression for the Choquet integral can be given in 
terms of the simple functions. 


Definition 6.19. A function f : X — [0,1] is a simple function if its image 
f(X) is finite. Let the image of f (X) be (a1,..., aw], where 0 = ap < a1 € 
a2 €: € ay — 1. Then, there exists a family of sets X = Ag 2 Ay D A2 2 
--- D An with characteristic functions xa, such that 


f(x) = max ax, (2). 
This expression is equivalent to 


f(x) = (a; — ai-1)X A, (x). 
i=1,N 


For simple functions f, the Choquet integral can be expressed by 


N 
(C) / fdu =Y ainlAi. 


Finally, for the sake of completeness, we give in Figure 6.7 an expression 
for the Choquet integral when the set X is not finite. 


6.2.1 Construction of Choquet Integral 


We have just given the definition of Choquet integrals where the aggregated 
value is computed as the integral of a function with respect to a measure. In 
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fact, the weighted mean can also be seen from this point of view. A weighted 
mean with a weighting vector p = (pi,...,pa) can be interpreted as an 
integral with respect to an additive measure defined on the singletons by 


p({zi}) =p 
Definition 6.20. Let u be an additive fuzzy measure; then, the integral of a 
function f : X + R* (with a; = f(x;)) with respect to u is 


WMyp(ai,...,aN E fdu = M f(x)u((z)). (6.4) 


rcx 


Here, p = (u(121]);-- “4 en})). 


Note that this is the expected value of the (multi)set f(x). 
For additive fuzzy measures, Equation 6.4 can be rewritten into several 
equivalent expressions: 


] e - XL Fete (6.5) 


zEX 
R 
m 25 bip(A|f (x) = bif) (6.6) 
Ny 
= 2 (ai = aii) fo) 2 a1) (6.7) 
= X (a - aci) — n(lf(z) < ai), (6.8) 





where a; = f(x;) and where b; corresponds to the ith value in increasing order 
in the set (f(x)|r € X) (that is, the set of values in the range of f(x)) and 
R is the cardinality of this set. Naturally, 0 < bı < b2 < --- < bg. Also, note 
that b; Z bj for i Æ j, although it is possible that f(x;) = f(x) for i Fj. 

Figure 6.8 illustrates this case. In the figure, N = 6 and X = {a1,... ay}. 
Then, we have a4 < a2 < a3 = a4 < as < ae, and, thus, bı = a, < b2 = a2 < 
b3 = a3 = a4 < b4 = as < bs = ag. Thus, R = 5. 

When the integral is seen as a way to compute the area under the func- 
tion f, each expression corresponds to a different way of computing it. Fig- 
ures 6.8 (a), (b), and (c) give a graphical interpretation of Expressions 6.5, 6.6, 
and 6.7. Note that this is not an area, because we are considering measures 
of the elements in X. 

In Figure 6.8 (a), the area (recall that this is not an area) is decomposed 
into blocks, each with one element of X. Let x be one of the elements in X 
and let b; = f(a). The area of each block is f(x) - u((x]) = bi- u((x]). In 
Figure 6.8 (b), the integral is decomposed into blocks of elements x with the 
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(a) (b) (c) 
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T flf (x) = bi 
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{x|f(x) > ai} 


Fig. 6.8. Graphical interpretation of several integrals: (a) according to Expres- 
sion 6.5; (b) according to Expression 6.6; (c) according to Expression 6.7 


same f(x). That is, the area of all x such that f(x) = b; is computed together 
and is defined as b; - u((x|f (x) = b;}). In Figure 6.8 (c), blocks are defined 
according to the values f(x). For each a;, elements with a value greater than 
or equal to a; are selected. In this way, the area is decomposed into blocks 
according to the pairs of consecutive values a;—1, a; in the range of f(x). For 
example, in the case of the consecutive values a;_, and a;, the area of the 
block is (a; — aj-1) - w({a| f(x) > a;]). 

When measures are additive, the three expressions are equivalent. How- 
ever, for non-additive fuzzy measures, they are usually different. From the 
point of view of aggregation, only Expression 6.7 is meaningful because (6.5) 
and (6.6) do not always satisfy the constraint that the integral is a value 
greater than or equal to the minimum and less than or equal to the maxi- 
mum. That is, among Expressions (6.5), (6.6), and (6.7), only Expression 6.7 
satisfies 


min(f (a1), ..., f(zv)) S fia <max(f(x1),-.-,f(tw)) (6.9) 


for all fuzzy measures p. 

Moreover, Expression 6.5 does not use interactions of sources, as the mea- 
sure is only applied to the singletons. Expression 6.6 exhibits a similar be- 
havior. This is so because in practical applications most values f(x) will be 
different (the cardinalities of the sets (f(x)|r € X} and X are similar) and, 
therefore, the measure will be also applied only to singletons. 


6.2.2 Properties 


'The Choquet integral satisfies the requirements for aggregation operators. 
That is, it satisfies unanimity and monotonicity. Therefore, it also satisfies 
internality. Additionally, it satisfies positive homogeneity (C(af) = aC(f) for 
positive a) and monotonicity on the measure (i.e., if (A) < p’(A) for all 
A C X, then CI,(f) € CI,(f) for all functions f). Another property that 
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the Choquet integral satisfies is horizontal additivity. This property is defined 
as follows. 


Definition 6.21. Let f : X — [0,1]; then, for c € [0,1], let us define f* as 
follows: 


ft _ JO if f(x) <c 
e | f(a) —c if f(x) » c. 
Then, f =(f Ac) +f is a horizontal additive decomposition of f. 


'The Choquet integral is horizontal additive because, for any decomposition 
of this type, 


CIL) = CIL ^e) + CILE). 


Due to this property, the Choquet integral is also called horizontal integral. 
Now, we present a representation theorem for the Choquet integral. This 
result is similar in spirit to the propositions given in Chapter 4. It estab- 
lishes that when some conditions are satisfied, the appropriate operator is the 
Choquet integral. The properties considered are based on comonotonicity. 


Definition 6.22. Let X be a reference set, and let f,g functions f,g : X —^ 
[0, 1]. Then, 


e f< g when, for all x, 
f(zi) < g(xi) 
e f and g are comonotonic if, for all xi, £j € X, 
f (xi) < f(xj) imply that g(xi) € g(v;) 
e C is comonotonic monotone if and only if, for comonotonic f and g, 
f < g imply that C(f) < C(g) 
e (C is comonotonic additive if and only if, for comonotonic f and g, 
C(f +9) = C(f) + Cg) 
Taking into account these properties, the following theorem can be proved. 


Theorem 6.23. Let C be an aggregation operator with the following proper- 
ties: 


e C is comonotonic monotone 
e (C is comonotonic additive 
e C(L...,1) 21 


Then, there exists a fuzzy measure u such that C(f) is the Choquet integral of 
f with respect to p. 
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It is also true that a Choquet integral satisfies the conditions above. Now, we 
review some results that relate this integral to WOWA, OWA, and weighted 
mean. 


Proposition 6.24. For every weighting vector p, we have W My = CI,, with 
p defined by 


Hp(B) = Ss) pi for all BC X. 
z;€B 


This proposition shows that the Choquet integral is a proper generaliza- 
tion of weighted mean for non-additive measures. That is, when a measure is 
additive, the Choquet integral reduces to a weighted mean. 


Proposition 6.25. For every weighting vector w, we have OW Aw = Cl, 
with u defined by 


|B| 
Bs (B) =X wi forall BC X. 
i=1 


Proposition 6.26. For every weighting vector p and every regular non- 
decreasing fuzzy quantifier Q, we have WOW Ap o = CI,, with u defined 
by 
Hp, Q(B) = Q( 5 pi) foral BC X. 
xiEB 


Note that the last measure corresponds to a distorted probability (Defini- 
tion 5.61), and that, when p; = 1/N, we have a symmetric measure (Propo- 
sition 5.72), and that with Q(x) = x we have a probability distribution. 

These propositions show that the weighted mean, OWA, and WOWA are 
particular cases of Choquet integrals. There are some results that establish 
the reversal conditions; that is, when a Choquet integral can be reduced to 
any of the former operators. The first result is about the symmetric Choquet 
integral, i.e., the Choquet integral that satisfies the symmetry condition. 


Proposition 6.27. Any symmetric Choquet integral CI, is an OWA opera- 
tor. 


Proposition 6.28. Any Choquet integral with a distorted probability is a 
WOWA operator. 


In fact, the quantifier Q (or the function w*) in the WOWA operator cor- 
responds to the distortion function in distorted probability, and the weighting 
vector p corresponds to the probability distribution in the distorted probabil- 
ity. 

The following example gives the distorted probability that can be built 
from the weighting vector p and a function w*. This fuzzy measure cor- 
responds to the one used in Example 6.10, where WOWA was considered, 
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Fig. 6.9. Fuzzy quantifier constructed from the OWA weights w — (1/3, 1/3, 1/3, 0) 
in Example 6.10 


as the function w* was built in Example 6.11 from the weighting vector 
w — (1/3,1/3,1/3,0). For the sake of completeness, we also give the fuzzy 
measures that make the Choquet integral equal to the OWA operator and the 
weighted mean. 


Example 6.29. Let p = (0.5,0.3,0.15,0.05) be a weighting vector and let w* 
be the function shown in Figure 6.9 and defined as follows 


(s) = 2/0.75 if x < 0.75 
ML NU e if x 0.75 


Then, the distorted fuzzy measure uwow A defined from p and w* is given in 
Table 6.1. The construction of iow follows Proposition 6.26. This table 
also displays the measures uwm and uow a constructed according to Propo- 
sitions 6.24 and 6.25. 

Note that the Choquet integral with respect to uwm is equivalent to the 
weighted mean, the Choquet integral with respect to wowa is equivalent 
to the OWA operator, and the Choquet integral with respect to uwowa is 
equivalent to the WOWA operator. 


Now, we consider the Choquet integral of a crisp set. 


Proposition 6.30. Let A be a crisp subset of X ; then, the Choquet integral of 
A with respect to u is u( A). Here, the integral of A corresponds to the integral 
of its characteristic function, or, in other words, to the integral of the function 
fa defined as fA(x) = 1 if and only if x € A. 


The proposition gives some hints about the definition of fuzzy measures. 

Let fp the membership function of a fuzzy set B C A. Then, due to the fact 
that the Choquet integral is monotone, for all u, we have CI (fB) € CI (fa). 
According to this, the Choquet integral can be interpreted as a measure of 
the fuzzy set B. 
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u(0) 0.0 0.0 0.0 
A ((C4)) 
u({C3}) 
{C3, C4}) 
u({C2}) 
(C2, C4}) 
{C2, C3}) 
u({C2, Cs; Ca}) 
u({C1}) 
{C1, Ca}) 
{C1,C3}) 
C1, C3, Ca}) 
(C. C2}) 
C1, C2, Ca}) 
C1, C2, C3]) 
,C2,C3, Ca}) 








Table 6.1. Fuzzy measures for Example 6.29 


6.3 Weighted Minimum and Weighted Maximum 


The operators described in previous sections are based on the existence of a 
numerical scale where addition and multiplication are applicable (and mean- 
ingful). Alternative operators exist that do not rely on these. They only use 
maximum and minimum (and thus they can operate on ordinal scales). This 
is the case with weighted minimum, weighted maximum, and Sugeno integral. 
The last is in some sense the counterpart of the Choquet integral in the ordinal 
setting. We begin in this section with the weighted minimum and maximum. 
Section 6.4 is devoted to Sugeno integrals. 

In the definition given below, the weighted minimum also requires the exis- 
tence of a negation function (see Section 2.3.1) over the domain. For example, 
the negation neg(x) = 1 — x can be used when the values xv are in the unit 
interval. The negation function is used so that the weights can be interpreted 
as importance (as with the weighted mean). An alternative definition without 
negation might be given but the meaning of the weights would change (as 
they would then correspond to the negation of importance). 

It has been said that aggregation operators C are defined so that they 
satisfy 

min(a1,...,anv) € C(ai,..., aw) € max(a1,..., ay). 


Therefore, minimum and maximum are extreme cases of aggregation opera- 
tors. 

Minimum can be seen as the determination of the most conservative or 
most pessimistic value; for example, the minimum consensus among informa- 
tion sources or the minimum value for which all information sources agree. 
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This interpretation is consistent with the use of the quantifier “for all” in 
the OWA operator that, as shown in Section 6.1.4, is equivalent to minimum. 
Moreover, minimum can be seen as intersection or conjunction when applied 
to certainty degrees. This was shown in Section 2.3.4. 

In contrast, the maximum can be seen as the selection of the most pro- 
gressive, innovative, or optimistic value: it corresponds to an extreme opinion, 
as only one information source needs to supply this value. This interpretation 
corresponds to the consideration of the fuzzy quantifier "there exists" in the 
OWA operator. In this case, the OWA operator behaves like a union or dis- 
junction of the certainty degrees. 

Weighted minimum and weighted maximum permit us to include weights 
in the aggregation process. In this case, weights are represented by means of 
a weighting vector, but now, instead of having weights that are positive and 
add to one, they are positive but at least one is maximal. That is, one of the 
weights corresponds to the maximum value in the ordinal scale. In the case 
where the unit interval is used, at least one weight should be equal to 1. From 
now on, we will call the weighting vectors used in the weighted mean and 
OWA the probabilistic weighting vectors (they can be understood as probabil- 
ity distributions); and we will call the ones used in weighted minimum and 
weighted maximum the possibilistic weighting vectors (they can be understood 
as possibility distributions). 

In this way, while the weighted mean is the aggregation of data with respect 
to a probabilistic weighting vector (a probability distribution), the weighted 
minimum and the weighted maximum are the aggregation of data with respect 
to a possibilistic weighting vector (a possibility distribution). 


Definition 6.31. A vector v = (v,...un) is a possibilistic weighting vector of 
dimension N if and only if v; € [0,1] and max; v; = 1. 


Definition 6.32. Let u be a weighting vector of dimension N; then, a map- 
ping WMin: [0,1] — [0,1] is a weighted minimum of dimension N if 
W Ming(a3,...,aN) = min; max(neg(u;), ai). 


Definition 6.33. Let u be a weighting vector of dimension N; then, a map- 
ping WMaz: [0,1] — [0,1] is a weighted maximum of dimension N if 
W Mazy(ai,...,au) = max; min(u;, ai). 


In these definitions, the values in the possibility distribution can be under- 
stood as certainty degrees. Alternative definitions can be given with a weight- 
ing vector v = (v1,..., uy) where v; = neg(u;). Note that having such a 
vector v, negation is not required for the weighted minimum. 

The problem of combining constraints in a fuzzy CSP problem can also be 
solved using weighted minimum or weighted maximum. This is shown in the 
next example. 


Example 6.34. Let us consider again Example 6.8, with the same formulation 
but with the importance of the constraints given by a possibilistic weighting 
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vector. Let this vector be u — (1,0.5,0.3,0.1). Note that the weighting vector 
in Example 6.8 is not suitable here because the possibilistic weighting vector 
should be such that at least one of the weights is equal to 1 (max; u; — 1). The 
evaluations for the possible solutions with such a vector u using the WMin 
are as follows: 


e sat(2,2) =WMin,y(0, 0.5, 1,1) 20 

e. sat(2,3)  W Mina (0.5,0.5,0.5,0.5) — 0.5 
e sat(2,4)  WMinyq(1,0.5,0,0.5) = 0.5 

e. sat(3.5,2.5) = W Minu(1, 0.5, 0.5,0.5) =0.5 
e. sat(3,2) — WMing(0.5, 1,1,0.5) = 0.5 

e sat(3,3) = W Miny(1, 1, 0.5, 1) = 0.7 


Note that the computation of these evaluations need neg(u). We have used 
neg(r) = 1 — x, which corresponds to neg(u) = (0, 0.5, 0.7, 0.9). 
In contrast, in the case of the WMax, the evaluations are as follows: 


e sat(2,2) = WMazy(0,0.5, 1,1) = 0.5 

e sat(2,3) = W Maz, (0.5, 0.5, 0.5,0.5) = 0.5 
e sat(2,4) = WMaz,(1,0.5,0,0.5) = 1 

e sat(3.5, 2.5) = WMazy(1,0.5,0.5,0.5) = 1 
e sat(3,2) = WMazy(0.5,1,1,0.5) = 0.5 

e sat(3,3) = WMaz,(1,1,0.5,1) 21 


These results show that when using the weighted minimum the pair (3, 3) 
is the best pair, but that when using the weighted maximum it is indistin- 
guishable from the pairs (2,4) and (3.5, 2.5). 


We now give another example to illustrate the application of weighted 
minimum and weighted maximum. In this case, we show its use in fuzzy 
systems. 


Example 6.35. Let us consider fuzzy inference systems where the rules follow 
the structure given in Section 2.3.5. That is, we have a set of rules ( R; i-i, N 
of the form 

According to what has been described in Section 2.3.5, when rules are 
disjunctive, the output of the fuzzy system, given x = xo, is 


B = v (Bi ^ Ai(zo)), (6.10) 
which, for a particular yo in the domain of Y, leads to the following member- 
ship degree: 

B(yo) = Via (Bi(yo) ^ Ai(xo)). 


This expression corresponds to the weighted maximum (Definition 6.33) of 
the values B;(yo) with respect to the weights A;(xo). That is, 
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(a) (b) 


1 £ To 
at at 
l-a l-a T 








Fig. 6.10. Considering uncertainty a in a fuzzy set B; with a triangular membership 
function: (a) conjunctive rules; (b) disjunctive rules 


B(yo) = WMary(B1(yo),---, Br (yo); (6.11) 


where the weighting vector u is defined as u = (A1(xo), ..., An (Zo)). 

Note that the weighting vector is independent of the value yo. Thus, for a 
given xo, the same aggregation operator with the same weights is applied to 
all yo in Y. Similarly, when rules are conjunctive, we have 


B(yo) = ^4 (Z(Ai(20), Bi(yo))). (6.12) 


In this case, with an appropriate selection of the implication Z, we can rewrite 
the expression in terms of a weighted minimum (Definition 6.32). In partic- 
ular, this is possible using the Kleene-Dienes implication. This implication 
(Section 2.3.2, Equation 2.17) is defined as Z(z, y) = max(1 — x, y). Thus, the 
following equation holds for such implications: 


B(yo) = ^&a (E (Ai(zo), Bi(yo))) = Ay max(1— Aj(xo), Bi(yo)). — (6.13) 


The weighting vector u defined as u = (A1(zo),..., Aw(xo)) permits us to 
rewrite B(yo) as follows: 


B(yo) = W Mina(Bi(yo), ... , Bn (yo)). (6.14) 


'This example shows that, in fuzzy inference systems, W Max and W Min 
are used for combining the certainty degrees of the output fuzzy sets B;(yo) 
and the degrees of satisfaction of the rules A;(ao). When the rules are con- 
junctive, the combination is a weighted minimum, and the fuzzy sets B; are 
transformed into B; V (1 — A;(xo)). Figure 6.10 (a) illustrates this fact when 
a = Aj;(xo) and when B; is represented as a triangular fuzzy set. 

When the rules are disjunctive, the combination is a weighted maximum, 
and the fuzzy sets B; are transformed into B; ^ A;(xo). Figure 6.10 (b) illus- 
trates this fact when a = A;(xo). 
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We have reviewed two operators that somehow correspond to a translation 
of the weighted mean (which combines values with respect to a probability 
distribution) into operators that combine values with respect to possibility dis- 
tributions. This translation also exists for the OWA operator. The resulting 
operators are OWMin and OWMax. Their definitions are similar to Defini- 
tions 6.32 and 6.33, but include an ordering permutation, as in the case of the 
OWA (see Definition 6.4). 


6.3.1 Properties of Weighted Minimum and Maximum 


'The aggregation operators described in this section are not directly related 
to the ones described in previous sections. Nevertheless, for some particular 
weights, equalities can be given. For example, when u — (1,1,...,1), we have 
that WMIN,- = min and W MAX, = max. 


6.3.2 Dealing with Symbolic Domains 


Note that, although definitions are given in the unit interval, the operators 
can be applied to symbolic ordinal domains. In this case, both weights and 
values have to be in the same domain, say L, and negation, when needed, 
is an involutive function that reverses the ordering in L. Because (according 
to the following proposition) conditions on negations for symbolic domains 
completely determine them, once L and the weights are known, the weighted 
minimum and the weighted maximum are completely determined. There is no 
need to define the negation function. 


Proposition 6.36. Let L = {lo,...,l,} with lo <z ly «rp: <z lr be a finite 
ordinal scale; then, there exists only one negation function, neg : L — L, 
satisfies 


(N1) if x <r x' then neg(z) >r neg(z') for all x, z' in L. 
(N2) neg(neg(x)) = x for all x in L. 


This negation function is defined by 
neg(x;) = £r—i for all x; in L. (6.15) 


6.4 Sugeno Integrals 


In the same way that the Choquet integral generalizes the OWA and the 
weighted mean, the Sugeno integral generalizes the weighted minimum (and 
the weighted maximum). While the W Min (and W Maz) express importance 
or reliability through weighting vectors, the Sugeno integral uses fuzzy mea- 
sures. Recall that this shift from weighting vectors to fuzzy measures occurs 
when moving from the weighted mean to the Choquet integral. 

Sugeno integrals, due to the combination of weights and values through the 
minimum, is defined for functions into [0, 1] and normalized fuzzy measures. 
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Fig. 6.11. Graphical interpretation of Sugeno integrals 


Definition 6.37. Let u be a fuzzy measure on X; then, the Sugeno integral 
of a function f : X — [0,1] with respect to u is defined by 


(5) | Faye = max, min FG). (Asi) (6.16) 


where f(x.) indicates that the indices have been permuted so that 0 < 
f(£s(1)) LaS f(%s(w)) <1 and As(i) = is; siy CNST- 


Proposition 6.38. The Sugeno integral of a function f : X — [0,1] with 
respect to a fuzzy measure u can be equivalently expressed by 


max min(f (zc), HAs) )) 


where Ask) = {£o |j < k} (or, equivalently, Aok) = {£o(1);---, Lo(k)} 
when k > 1 and As(o) = 0), and where o is a permutation such that f (xz(j) = 
f(zo(i41)) for i > 1. 


To denote a Sugeno integral of a function f over the domain X, we will use, 
when no confusion arises, the notation SI,(ai,...,aw) for (S) f fdu. Here, 
a; = f(x) as before. 

The Sugeno integral can be interpreted in a way similar to the weighted 
maximum (note that expressions are similar, but use the measure instead of 
the possibility distribution). The difference is that now each value f(z,(,j) 
is weighted according to the weight (the measure) of all the sources that 
support a value (i.e., that supply a value larger or equal than /f(2,(;))); that 
is, according to u(A). 

Figure 6.11 (a) gives a graphical interpretation of the operation of Sugeno 
integrals. The figure displays the values f(x, ;)), the measures for u(A,(;)); 
and the combination min(f(x;(;),u(A.()) for all zx; € X. The largest of 
these values is the result of the Sugeno integral. In this figure, the values 
are denoted by dots, the measures are denoted by crosses, and the values 
min(f(z,(), M(A;(;))) are denoted by squares. 

The graphical interpretation of the Sugeno integral shows that the out- 
come is obtained by "saturation." We can see that the integral selects the 
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Let X be a reference set, let (X, A) be a measurable space, let u be a fuzzy measure 
n (X, A), and let f be a measurable function f : X — [0,1]; then, the Sugeno 
integral of f with respect to pu is defined by 


Su(f):— sup [r ^ uy(r)], 
r€[0,1] 


where pş (r) = n((alf(z) > r}). 


Fig. 6.12. Sugeno integral in a continuous domain 


set|| T'okyo) | kK yoto}|{Nagano}|{Tokyo,| (Kyoto, | {Tokyo, |X 
r i Nagano}|Nagano} 
| 09 | 06 | 08 [i| 





Table 6.2. Fuzzy measure for the traveler example: Satisfaction degree for visiting 
Tokyo, Kyoto, and Nagano 


importance that overcomes (or saturates) certain thresholds. Due to the or- 
dering, the threshold is decreasing as long as inputs are increasing. In this 
way, the integral finds a trade-off (or compromise) between the importance or 
reliability of the sets (i.e., 4(A)) and the values that the members of the sets 
have assigned (i.e., the value maxze, f (x)). 

For a continuous monotone or convex function, the Sugeno integral com- 
putes the length of the square with maximum area, as shown in Figure 6.11 
(b) and (c). The definition of the Sugeno integral for continuous functions is 
given in Figure 6.12. 

The following example illustrates the Sugeno integral. 


Example 6.39. Let us consider a decision making problem. There is a traveler 
who intends to visit three cities in Japan, and considers different locations 
for staying. The three cities, represented by X = {21, £2, £3}, correspond to 
Tokyo, Kyoto, and Nagano. 

Let us consider the degree of satisfaction of the traveler when visiting such 
cities. Such a degree is expressed by means of a fuzzy measure. Note that the 
measure is monotone, as the greater the number of cities visited, the greater 
the satisfaction. Boundary conditions mean that no cities visited correspond 
to the lowest satisfaction (equal to zero), and that all cities being visited 
correspond to maximum satisfaction (equal to one). A fuzzy measure for the 
satisfaction of the visitor is given in Table 6.2. 

Then, for finally deciding where to stay, the degree of satisfaction u should 
be combined with the accessibility of the town from the visitor's location. To 
do so, we should express the accessibilities. In other words, for a given visitor's 
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er [Tokyo] Kyoto] Nagano 





fil os | 04 | 05 
us|] 0.7 | 1 0.8 


Table 6.3. f: Accessibility degrees to x; from Tsukuba; py: Satisfaction degree for 
visiting x; and all those city that are at least as accessible as x; 


location, the degree of accessibility is a function of the town to be visited. 
That is, if the traveler is located in Tsukuba, then the most accessible town 
is Tokyo, followed by Nagano and finally Kyoto. This situation is modeled 
by a function f : X — D such that f(Tokyo) > f(Nagano) > f(Kyoto). 
Table 6.3 gives measures for such accessibility from Tsukuba. The values for 
the measure are expressed using the same terms as the degree of satisfaction. 
This is, f(x) is comparable with u(A). 

Once we have defined u and f, we can consider the degree of satisfaction of 
visiting a particular city x; and all the cities at least as accessible as z;. With 
regard to accessibility, we might consider that if we visit r;, it is meaningful 
to visit also cities that are easier to visit than x;. For example, if we stay in 
Tsukuba and we visit Nagano, then it is reasonable to visit also Tokyo, which 
is nearer. For a given z;, the set of such cities is (z|f(x) > f(r;)]. In the 
case of visiting Nagano, ([x|f(x) > f(Nagano)) = (Tokyo, Nagano}. Then, 
we can compute the degree of satisfaction by applying the measure u to such 
a set. So, the degree of satisfaction for visiting x; and nearer cities is defined 
by 

14 Gs) = nf) > f(2s)}). 
Accordingly, (Nagano) = u((Tokyo, Nagano}) = 0.8. Similarly, we com- 
pute the values for Tokyo and Kyoto. They are given in Table 6.3. 

Now, we consider an overall degree of being in the town of Tsukuba. To 
do so, we consider, for all cities z;, the degrees f(r;) and py(x;). That is, 
the accessibility of x; and the satisfaction of visiting x; as well as cities more 
easily accessible. To combine the two values, as in some sense we want both 
to be satisfied, we apply a conjunctive approach. That is, we combine them 
with the minimum (which, as seen in Section 2.3.1, can be used to model 
conjunction). The rationale of the approach is given below by considering two 
cases. 


Case f(x;) > uf(xi): In this case, it is easier to access x; than the traveler's 
satisfaction. Accordingly, the overall degree for x; cannot be larger than 
p gi). 

Case pg (z;) > f(xi): In this case, accessibility is easier than the correspond- 
ing satisfaction. Nevertheless, the overall degree of the city cannot exceed 
that of the cities visited. 


So, in both cases we consider the minimum of the two values. Note that 
this operation is valid because both degrees are expressed in the same terms 
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(i.e., they are comparable). Thus, using ^ to denote the minimum, we use 
f (vi) A u(x) to denote the evaluation of visiting z;. 

Taking everything into account, the place x; with the largest evaluation 
f(xi) ^ ng(xi) stands for the evaluation of staying in Tsukuba. This largest 
evaluation corresponds to the Sugeno integral of f with respect to u, which 
in this example is equal to 


SI,(f) = max f(zi) ^ ug(zi) = 0.7. 


Note that this overall satisfaction equal to 0.7 is obtained for x; = Tokyo 
(i.e., arg maxr; f(zi) ^ uf(xi) = Tokyo), and the cities to be visited will be 
(Tokyo, Nagano}. So, using the Sugeno integral, we visit the cities where 
accessibility equals satisfaction; cities are selected so that satisfaction com- 
pensates accessibility. In a more general case, with a larger set X, we add 
more and more cities, increasing satisfaction and decreasing accessibility until 
both degrees are equal. 


Note that in this example we have used satisfaction degrees in [0, 1]. Nev- 
ertheless, the same would apply if we consider other ordinal scales. In par- 
ticular, the same applies if we consider an ordered set L = {lo,...,1,} with 
lg «rl <L <z ly. 


Example 6.40. In Example 6.35, we have seen that the WMin and WMax can 
be used to combine the certainty degrees of the output fuzzy sets. In the case 
where there are interactions among the rules, Sugeno integrals might be used 
for defining appropriate fuzzy measures. 


Example 6.41. Let us consider again Example 6.10 (Section 6.1.2). In this case, 
we will use the Sugeno integral, considering three different fuzzy measures. In 
particular, the measures are defined in Table 6.1. They are reproduced in 
Table 6.4. 


e sat(2,2) = S1,,,,,(0,0.5,1, 1) = 0.5 

© sat(2,3) = SI,,,,4 (0.5,0.5,0.5,0.5) = 0.5 
e sat(2,4) = $1,,,4,(1,0.5,0,0.5) = 0.5 

e. sat(3.5,2.5) = SI, ,, (1,0.5,0.5,0.5) = 0.5 
© sat(3,2) = 81,44 (0.5, 1, 1,0.5) = 0.5 

e sat(3,3) = SI,,,,, (1,1,0.5, 1) = 0.85 

© sat(2,2) = SInowa(0,0.5, 1, 1) = 0.66 

© sat(2,3) = SIjoy (0.5, 0.5, 0.5, 0.5) = 0.5 
© sat(2,4) = STnow (1, 0.5, 0,0.5) = 0.5 

© sat(3.5,2.5) = $1, 4 (1, 0.5, 0.5, 0.5) = 0.5 
© sat(3,2) = SI, o, 4 (0.5, 1, 1, 0.5) = 0.666 

e sat(3,3) = SIpowa (1, 1, 0.5, 1) = 1.0 

© sat(2,2) = SIpwowa(0,0.5,1,1) = 0.5 
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u({Ca}) 
A ((Cs]) 
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u({C2}) 
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{C2, C3}) 
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u({C1}) 
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{C1, C3}) 
({C1, C3, Ca}) 
(C. C2}) 
C1, C2, Ca}) 
C1, C2, C3]) 
, C2, C3, C4]) 








Table 6.4. Fuzzy measures for Example 6.29 


,3) = SI, vow a (0.5, 0.5, 0.5, 0.5) = 0.5 

4) = SI, vow 4(1, 0.5,0,0.5) = 0.666 

5, 2.5) = SI, ow a (1, 0.5, 0.5, 0.5) = 0.666 
2) = SI, ow a (0.5, 1, 1,0.5) = 0.6 

,3) = Sli v owa (1, 1,0.5,1) = 1.0 


© © 909 oè 
w 
8 
= 
TM 


6.4.1 Properties 


In this section, we review a few results related to Sugeno integrals. We start 
by establishing that the weighted minimum and the weighted maximum are 
particular cases of the Sugeno integral. We show the measures that permit us 
to establish this relation. 


Proposition 6.42. The Sugeno integral generalizes both weighted minimum 
and weighted maximum. 


1. A weighted maximum with a possibilistic weighting vector u is equivalent 
to a Sugeno integral with the fuzzy measure 


wmaz( Ay — " 
Hu (A) = max u; 
2. A weighted minimum with a possibilistic weighting vector u is equivalent 
to a Sugeno integral with the fuzzy measure 


Ha in (A) nd 
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Given a possibilistic weighting vector u, the two fuzzy measures generated 
above are dual, as shown in Proposition 5.32. That is 


ug ^ (4) = 1 utm CV A), 
and when u is a possibility distribution, u//"'"!" defines a necessity distribution. 


Proposition 6.43. The following hold for all functions f, g, and for any 
fuzzy measure u: 


1. if f < g, then SI (f) € SI,(g) 

2. SI,(a,..., E a for any constant a € [0, 1] 

5 Sly (fea) <8 I (f) + SI,(a) for any constant a € [0, 1] 
4. SIL(f V 9) > (SIL(f) V ST, (g)) 

5. SI(f ^ g) € (SIL(£) ^ SI(g)) 


The next proposition is analogous to Proposition 6.30 for the Choquet 
integral. 


Proposition 6.44. Let A be a crisp subset of X ; then, the Sugeno integral of 
A with respect to u is u( A). 


Now, we consider two representation theorems for the Sugeno integral. We 
will use the comonotonic monotone property established in Definition 6.22 
and the definitions given below. 


Definition 6.45. Let X be a reference set, let a be a value in |0, 1], and let 
f.g functions f,g : X — [0,1]. Then, 
e Cis minimum homogeneous if and only if, for comonotonic f and g, 
C(a ^ f) za AC(]) 
e C is comonotonic mazitive if and only if, for comonotonic f and g, 
C(f v g) =C(f) V C(g) 
The following results can be proved. 
Proposition 6.46. Let C be an aggregation operator with the following prop- 
erties: 


C is comonotonic monotone 
C is comonotonic maxitive 
C is minimum homogeneous 


C(,...,1)=1 


Then, there exists a fuzzy measure u such that C(f) is the Sugeno integral of 
f with respect to p. 


Proposition 6.47. Let C be an aggregation operator with the following prop- 
erties: 
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e C(fvg) - C(f) V Clg) 
e C(a^ f) 2 aAC(f) 
C(1,...,1) 21 


Then, there exists a possibility measure Pos such that C(f) is the Sugeno 
integral of f with respect to Pos. 


Both Choquet and Sugeno integrals integrate functions with respect to 
fuzzy measures. Nevertheless, for most of the functions, their results are differ- 
ent. The following proposition establishes a measure of the difference between 
the two results. 


Proposition 6.48. Let f : X — [0,1]; then, 


IST, (f) - CILCÉ)| < 1/4. 


6.5 Fuzzy Integrals 


In this section we review two different approaches that have been defined to 
encompass in a unified framework the two families of integrals reviewed in 
this chapter. First, we will present the fuzzy t-conorm integral, and then, the 
twofold integral. The interest of the t-conorm integral is mainly conceptual. 
In the case of the twofold integral, we give an example. 


6.5.1 The Fuzzy t-Conorm Integral 


The Fuzzy Integral was defined to put Choquet integral and Sugeno integral 
into a unified framework. That is, a more general integral that encompasses 
both was introduced. The new integral uses pseudo addition and multipli- 
cation. The operators can be replaced by addition and multiplication in the 
Choquet integral, or maximum and minimum in the Sugeno integral. 

Formally, the fuzzy integral is defined over a tuple called a t-conorm sys- 
tem for integration, and an operator —, built on one of the elements of the 
tuple. Then, three different spaces are considered, each with an associated 
t-conorm (t-conorms are used here as pseudo-additions). The following spaces 
are considered: 


1. The space of values of integrands (F): This domain is denoted by D = 
[0,1], and thus the function to integrate is such that f : X — D. The 
corresponding t-conorm is denoted by A. So, F — (D, A). 

2. The space of values of measures (M): Denoting the domain by T' — [0, 1], 
we have u : p(X) — T. The corresponding t-conorm is L. Therefore, 
M - (T, 1). 

3. The space of values of integrals (I): In this case, the domain is denoted 
by T = [0, 1], and the corresponding t-conorm is L. Thus, J = (T, L). 
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Fig. 6.13. The spaces in the fuzzy integrals 


In some situations, it is possible to consider that two of the spaces can 
collapse into a single one: 


1. When the integral is understood as the measure of a fuzzy set A, and this 
measure is defined to be an integral of the membership function of the 
fuzzy set, then I = ([0, 1], L) should be equal to M = ([0, 1], L). This is 
so because the integrals, which are valued in J with respect to a measure 
u: QCX) — M, are an extension of the measure ji. 

2. When the integral is understood as some kind of expected value of the 
function to integrate f, then the space of the integral ([0, 1], L) and the 
integrand ([0, 1], A) should be the same. 


The spaces are illustrated in Figure 6.13 (left), using the expression of 
the Choquet integral. The examples given in this chapter fall into the second 
category: the integral is a kind of expected value. 

The decomposition presented here is not the only one possible. Other au- 
thors include an additional space. This situation is represented in Figure 6.13 
(right), and the new space, denoted by /*, is where the products of F by M 
are operated. In this case, /* is an internal space for operation. Together with 
these three spaces, a product-like operation & : D x T — T is considered. 
'This is the pseudo multiplication mentioned above. 

The t-conorms presented here and the operation & define, when some 
conditions are fulfilled, a t-conorm system. This is formally defined as follows. 


Definition 6.49. F = (A,1,1,, &) is a t-conorm system for integration if 
and only if 


1. A, L, and L, are continuous t-conorms that are the maximum or Archimedean. 
2. & : ([0, 1], A) x ([0, 1], L) — ([0, 1], L) is a product-like operation fulfilling 

a) Q is continuous on (0, 1]? 

b)a@x=0 if and only if a 20 orx —0 

c) when aly < 1, a & (x. Ly) = (a & x). (a & y) for all a € [0,1] 

d) when a Ab < 1, (aAb) & x = (a & x).L(b c x) for all x € [0,1]. 


The definition does only consider t-conorms that are either the maximum 
or Archimedean. In the case where A, |, and L are continuous Archimendean 
t-conorms, we will denote their generators by k, g, and h. 

According to the type of t-conorms A, |, and 1, four types of t-systems 
can be distinguished: 
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Type (i): A,.L, and L are Archimedean t-conorms. 

Type (ii): A, L, and L are the maximum. 

Type (iii): L is an Archimedean t-conorm, and at least one of the others is 
the maximum. 

Type (iv): L is the maximum, and at least one of the others is Archimedean. 





Theoretical results show that only types (1) and (ii) are meaningful. In the 
other cases, the product-like operation becomes constant, or the product is 
a function that only depends on one of the arguments (the other arguments 
do not affect the result). So, in such cases, the integral is not much useful for 
aggregation. 

We review now some results relating to the four types of t-systems, and 
we show that only (i) and (ii) are of interest. 


Proposition 6.50. Let F = (A, L,L,®) be a t-system of type (i). Then, 
a & x —1 for all a » 0 and xz »0 
or 
a x= hC P (k(a) - g(x)) for all a in [0,1] and for all x in [0,1], (6.17) 


where k, g, and h are generators of A, L, and L and where h C (x) stands 
for the quasi-inverse of h. 


This case corresponds to Type (i) above. When a t-system is defined with 
three Archimedean t-conorms, and the product-like operator & is defined ac- 
cording to Equation 6.17, we will say that we have an Archimedean t-system. 
Note that, in this case, the t-system is completely determined by the genera- 
tors (k, g, h). 


Proposition 6.51. Let F = (A,1,1L,®) be a t-system of type (ii). Then, 
when ® is a nondecreasing operator (i.e., if x > y, then z 8x > z & y for all 
x, y, and z in [0,1] and in relation to the first argument of &), equations (c) 
and (d) in Definition 6.49 hold. 


This situation corresponds to Type (ii) above. In this case, a non-decreasing 
operator ® satisfying (a) and (b) in Definition 6.49 defines a t-conorm sys- 
tem for integration, with A = L = L = maximum. This is a maximum type 
t-system (a maximum t-system for short). 





Proposition 6.52. Let F = (A, L, L,®) be a t-system of type (iii). Then, 
a G z —1 for all a » 0 and x > 0. 


Proposition 6.53. Let F = (A, L, L, 9) be a t-system of type (iv). 
If L is an Archimedean t-conorm, then 


aGz-aGlfor alla in [0,1] and x > 0. 
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Let X be a reference set, let (X, A) be a measurable space, let u be a fuzzy measure 
on (X, A), let f be a measurable function f : X — [0,1], and let F = (A, L, L, &) 
be a t-system for integration; then, the fuzzy t-conorm integral (or fuzzy t-integral) 
of f based on (A, L, L,®) with respect to p is defined as 


(F) | fe dui tim s QD f fa @ du. 


where {fn } is a nondecreasing sequence of simple functions which pointwise converge 


to f. 


Fig. 6.14. Fuzzy t-conorm integral in a continuous domain 


If A is an Archimedean t-conorm, then 
a®@x=1@x for all x in [0,1] and a > 0. 
Furthermore, if A and L are both Archimedean t-conorms, then 
a®«x=1@1 for alla 0 and x » 0. 


These results show that only for types (i) and (ii) the product-like oper- 
ator is really a function of the two arguments. 'l'herefore, we only consider 
types (i) and (ii) below. The definition is based on the substraction operator 
— A constructed from the t-conorm A (recall Definition 2.51). 


Definition 6.54. Let u be a fuzzy measure on X, and let F = (A, L, 1,8) 
be a t-system for integration. Then, the fuzzy t-conorm integral (or fuzzy t- 
integral) of a function f : X — [0,1] based on (A, L, L,®) with respect to u 
is defined by: 


c [ reda = L (ai —A ai-1) 8 u( As(i)); 


where aj = f(x.) with f(s) € f(€sci41)) and ao = f(£s(0)) = 0, and 
As) = {2s(),--+,Zacny} - 


This definition can be interpreted in a way similar to the one for the 
Choquet integral displayed in Figure 6.8 (c). See Figure 6.15. In this case, 
(a; —A a;-1) corresponds to a way to measure the height of the block, and 
A CAsq)); as before, corresponds to the measure of the elements comprising 
the block. The operator & is a way to combine the two values to evaluate the 
block. 

For the Choquet integral, the evaluation roughly corresponds to the area, 
while for the Sugeno integral, it corresponds to the shortest length between 
a; (note that a; —A a;.1 = a;) and u(A,(;). Figure 6.14 corresponds to the 
definition of this integral for continuous functions. 
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Fig. 6.15. Interpretation of the fuzzy integral 


'The expression of the Fuzzy t-integral can be rewritten for the two types 
of t-conorms considered above: Archimedean and maximum. 


Proposition 6.55. Let u be a fuzzy measure on X, and let F = (A, L, 1,8) 
be a t-system for integration. Then, the following holds for the fuzzy t-integral 
of a function f : X — [0,1] based on F = (A, 1,4, &) with respect to u: 


Type (i): When the system is Archimedean, and k, g, and h are the generators 
of A, L, and L, the following equality holds: 


c f ted 2 h^ (mint). (C) f ko 1 d(g o u))). 


where o is function composition. 
Type (ii): When the system is a maximum t-system and ® is a t-norm, the 
following equality holds: 


(F) | fedu= M io nsn) 


i=1,N 


The first expression reduces to the Choquet integral when k = g = h = 
id, and the second expression reduces to the Sugeno integral when ® is the 
minimum. With ® being the product, the Fuzzy t-integral computes the area 
of the square with maximum area. 

Some particular cases of the expression above are known in the literature 
as Choquet-like and Sugeno-like fuzzy integrals. They are formally defined as 
follows: 


1. Choquet-like is a fuzzy t-integral with type (i) t-system, and such that 
à & z = 1 if and only if a = 1 and x = 1. This is equivalent to h(1) = 
(1) -g(1). 

Therefore, when the generators of the Archimedean t-conorms A, L, and 
-L are, respectively k,g, and h, we have that the Choquet-like integrals 
have the following expression: 
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h- (h(1) ^ (C) J ko fd(go p). (6.18) 
For example, for h(x) = k(x) = z“ and g(x) = x, we have 
(1A (C) ? f*du) "^ (6.19) 
or, using Equation 6.2, 
y 1 
(D Ulea Aoa) — H( Ao)" (6.20) 


2. Sugeno-like is a fuzzy t-integral with type (ii) t-system, with & a t-norm. 


6.5.2 Twofold Integral 


The twofold integral is an alternative generalization for both Choquet and 
Sugeno integrals. Informally, the t-conorm integral builds its generalization by 
assuming that the two fuzzy measures in the Choquet and Sugeno integrals 
can collapse into a single measure in the generalized integral. In contrast, in 
the twofold integrals, the two fuzzy measures are kept as they are. 

The rationale of this approach is that the semantics of both measures 
are different. In particular, the Choquet integral is seen as a ” probabilistic 
flavor” measure, and the Sugeno integral is seen as a "fuzzy flavor” measure. 
So, the definition keeps both measures, including both fuzzy and probabilistic 
flavors. We will use uc to denote the measure that corresponds to the one in 
the Choquet integral (the one with the probabilistic flavor), and us for the 
one in the Sugeno integral (fuzzy flavor). 


Definition 6.56. Let uc and us be two fuzzy measures om X; then, the 
twofold integral of a function f : X — [0,1] with respect to the fuzzy measures 
Lg and uc is defined by 


Tusc (f -X( VA 25()) ^ us( A s) (Hols) = neCAs)) ). 


where f(z,(j) indicates that the indices have been permuted so that 0 < 
f(z)0) < € f(s) € 1, and where Asai) = {£s(i) t: Zany} and 
As(n+1) = 


Now, we turn to the properties of this integral. We start by considering 
the relation between the twofold integral and the Choquet and Sugeno inte- 
grals. We show that this integral is a proper generalization of the Sugeno and 
Choquet integrals, as, for a particular measure, the twofold integral reduces to 
one of the others. In particular, generalization is obtained using the measure 
u* given in Definition 5.5, corresponds to ignorance (u*(A) = 1 when A Æ 9, 
and u(Ø) = 0). 
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Fig. 6.16. Graphical representation of fuzzy integrals: (a) Sugeno integral; (b) 
twofold integral 


Proposition 6.57. The twofold integral satisfies the following properties. 
When uc = u*, the twofold integral reduces to the Sugeno integral: 


TE a (Ot 5 7.5 an) = SIps (Or acorn) 
When us = u*, the twofold integral reduces to the Choquet integral: 
Tlus,uc (a1,..., a4) = Clyro (a1, ..., an). 


When uc = us = u*, the twofold integral reduces to the maximum: 


TI,s,uc (a1, — An) = Vas, cs Qn). 


Additionally, the twofold integral satisfies the basic properties of aggrega- 
tion operators. That is, it is monotonic, satisfies unanimity, and, therefore, 
yields a value between the minimum and the maximum. 


Proposition 6.58. Let X be a finite set and let uc and us be two fuzzy 
measures on X; then, 


(i) for all functions f and g over X such that g(x) > f(x) for all x € X, 


Tlus,uc (g9) = TlI,us,uc (f) 


(i) for all a = (a,...,a), 
Tys,yo (a) = a; 
(iii) for all functions f on X, 


min f(x) < TIns uc (f) < max f (x). 


The next few propositions establish some additional properties of the 
twofold integral. 


Proposition 6.59. Let A be a subset of X and let fa be its characteristic 
function; then, the twofold integral of fA with respect to the two measures us 
and uc is equal to 


Tlys,uc (fa) = us(A) : nc(A). 


6.6 Hierarchical Models for Aggregation 189 


Let X be a reference set, let (X, A) be a measurable space, and let uc and us be 
fuzzy measures on (X, A). Then, for a measurable function f : X — [0,1], let us 
define à; : [0, 1] — [0, 1] by 


O(a) = V (rAus({f » ry) 


O<r<a 


Note that ¢(1) = Sus (f) and 9; is nondecreasing, so the cardinality of noncontin- 
uous points of p is at most countable. df permits us to define a Lebesgue-Stjeltjes 
measure ve, on the real line by 


v, (la, b]) :— f(b + 0) — ġs (a — 0). 


Then, the twofold integral of a measurable function f : X — [0,1] with respect to 
fuzzy measures ug and uc is defined by 





Phera = / uc(f > a)dvs, (a). 


Fig. 6.17. Twofold integral in a continuous domain 


Proposition 6.60. For all f, the following inequalities hold for the Choquet 
and Sugeno integrals: 


Tliswell) < Clue) 


Tlus.uc(f) S Slus(f) 


Additionally, it can be proved that the following relation holds: 


Proposition 6.61. For all f, uc, and us, 


Tlys,yc (f) = Clue (f ^ SI,s (f)). 


Figure 6.16 gives a graphical representation of the twofold integral. The 
figure includes (left) the representation of the Sugeno integral (already given 
in Figure 6.11) and (right) the representation of the twofold integral (filled 
part). Figure 6.17 gives the definition of the twofold integral in a continuous 
domain. 


6.6 Hierarchical Models for Aggregation 


Aggregation operators can be combined to obtain hierarchical models. This 
corresponds to computing partial aggregations, and then combining them into 
an overall score. 

Hierarchical models are appropriate to decompose a complex problem into 
simpler ones and to improve modularity. Additionally, when the number of 
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q- Ir 


Fig. 6.18. Hierarchical models for aggregation with 4 inputs: (a) non-hierarchical 
case, (b) hierarchical case, (c) hierarchical case with duplicated inputs (overlapping 
hierarchical model) 





























information sources is large, the number of parameters to tune the system 
might become very large. T'his is the case with fuzzy integrals, which require 
fuzzy measures. As we have seen in Chapter 5, fuzzy measures are power set 
functions, and therefore, for N sources, 2" parameters have to be fixed. The 
use of hierarchical models defined in terms of fuzzy integrals permits us to 
reduce the number of parameters. 

Figure 6.18 illustrates this case for four input variables. The figure includes 
the case of one-step (one aggregation operator) and two-step (two partial ag- 
gregation operators and a general combination) models. In this case, when 
fuzzy integrals are used, the definition of the parameters (fuzzy measures) re- 
quires 2^ — 2 (considering boundary conditions) values for the nonhierarchical 
model, but only six values for the parameters of the hierarchical model. 

More complex models can also be considered. This is the case in Figure 6.18 
(c), where the inputs are duplicated. In this case, one aggregation combines 
four inputs, and the other aggregation combines the original four inputs and 
the result of the previous aggregation. So, the number of parameters is even 
larger than in the nonhierarchical model (i.e., (2* — 2) + (2° — 2)). In case of 
dupplication in the inputs, we call the models overlapping hierarchical models. 
The term separated hierarchical model will be used for the nonoverlapping 
case. 

The twofold integral studied in the previous section can be considered in 
the light of hierarchical models. Proposition 6.61, which linked the twofold 
integral with the Choquet and Sugeno integrals, permits us to consider a 
two-step hierarchical model with the Sugeno integral in the first step and the 
Choquet integral in the second step. This hierarchical model corresponds to 
the one in Figure 6.18 (c). 

Some results have been obtained about the modeling capabilities of such 
hierarchical models. We consider multistep hierarchical models in which the 
aggregation operator is the Choquet integral. 


Definition 6.62. Let f : X — R* be a function that represents the data to 
be aggregated. Them, 


e If C(f(zi),....f(zw)) = f(a) for some vi, then C is a 0-step Choquet 


integral. 
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e If C; for j € M — (1,..., mj are kj-step Choquet integrals, then, given a 
fuzzy measure u on M, the function 


C(f(z1),.- f(a) = 
CI, (Ca (f (21), SER f(zn)), $a ,Cm(f (z1), DE f(tn))) 
is a k-step Choquet integral for k = 1+ max(k;|j € Mj. 


An important result about hierarchical models is that when we have an 
m-step overlapping model, it can always be reduced to an equivalent two-step 
model. Moreover, some constraints can be given to the corresponding fuzzy 
measures. 


Theorem 6.63. The following conditions hold: 


(i) Every multistep Choquet integral is a monotone increasing, positively ho- 
mogeneous, piecewise linear function. 

(it) Every monotone increasing, positively homogeneous, piecewise linear func- 
tion on a full-dimensional convex set in RN is representable as a two-step 
Choquet integral such that the fuzzy measures of the first step are additive 
and the fuzzy measure of the second step is a 0-1 fuzzy measure. 


So, in principle, in the case of Choquet integrals, no model more complex 
than a two-step one is needed, as all other models can be reduced to it. 


6.7 Bibliographical Notes 


1. General references on aggregation operators: General references 
for aggregation operators include the books by Grabisch, Nguyen, and 
Walker [172], Bullen, Mitrinovic, and Vasic [50], Bullen [51], and Calvo, 
Mayor and Mesiar [56]. See also [111], [55], and [52], which give reviews of 
aggregation operators. The books on fuzzy measures and integrals, [174], 
[427], and [385], are also adequate. The chapter by Benvenuti, Mesiar in 
Vivona [41] in the Handbook on Measure Theory by Pap [312] includes 
definitions for fuzzy integrals and some of their properties. General books 
on measure and integration are also appropriate. Some of them include 
some integrals (e.g., Konig [216] describes the Choquet integral). 

Applications of aggregation operators can be found in several papers. 
[389, 414] consider the use of fuzzy integrals to combine the results of 
fuzzy-rule-based systems. The former considers both Choquet and Sugeno 
integrals, and the latter focuses on the Sugeno integral. [374] gives an 
exhaustive comparison of operators (67 parameterized operators) with 
respect to their performance in image retrieval; not only aggregation op- 
erators, but also other operators, such as t-norms and t-conorms, are con- 
sidered. Example 6.8 is based on [366] and [396]. 

For other applications, see the corresponding bibliographical notes in 
Chapter 1. 
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Median: Definition 6.7 is the standard definition for the median. Nev- 
ertheless, other definitions are possible when N is even. This matter is 
briefly discussed in the bibliographical notes of Chapter 1. The results by 
Jackson (1921) [201] are recalled there, as they are of interest for alterna- 
tive definitions for even N. 


. OWA operators: The first definition of the OWA operator is given 


in [442]. See also [444]. Example 6.6, on the use of OWA to model the 
score system in the Olympic Games, is due to Yager. [145] presents a 
characterization of the operators. Some of their mathematical properties, 
as well as some applications, are described in [456]. 

The use of fuzzy quantifiers to define weights appears in [442] and [444]. 
Liu in [228] proposed the study of the generating functions of fuzzy quan- 
tifiers. Liu studied, among other properties, their orness. 

Several generalizations exist for the OWA operators. For example, 
the quasi-OWA operator (introduced by Fodor, Marichal, and Roubens 


in [145]): 
o*( 2. wió(as(i))) 


Ralescu and Ralescu [328] (Example 2, p. 326) introduced in 1997 the 
Geometric OWA (GOWA). This operator, named in [74] for its similarity 
with the geometric mean, a geometric mean of order statistics, corresponds 
to 

Haas 
where w; define a weighting vector (* 7; w; = 1 and w; > 0). This operator 
has been further studied in [185, 231, 440] 

Other generalizations of the OWA operator include, the Nonmonotonic 
OWA [449], the BADD-OWA [454], the Generalized OWA, the Induced 
OWA (IOWA) [271, 455], and the Induced GOWA (IGOWA) [75]. The 
BADD-OWA (BAsic Defuzzification Distribution OWA) corresponds to 
the counter-harmonic mean. The Generalized OWA, introduced by Yager 
in [452], corresponds to the root-mean-power (generalized mean) of order 
statistics. For a given o, it is defined as 


Oe wig y) ue 


The IOWA is an OWA operator where the ordering is defined in terms 
of a priority vector b = (b,...b~), which corresponds to the priority 
of z;. The IOWA of a with respect to w and b is »5,w;ag(; where 
1c (1), ..., c(N)) is a permutation of (1,..., N} such that b5(; 1) > boli) 
for all à = {2,...,N}. This operator, named by Yager and Filev in [455], 
was introduced by Mitchell and Estrakh [271] (see also [347] and [272]) in 
1997. The Induced-Choquet integral (I-COA) was introduced in [450]. 


. On the weights for the OWA operator and weighted means: 


Considering data with extreme values as outliers and removing it before 
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applying a mean has been used for a long time. For example, the use of the 
arithmetic mean without the two extreme values — which corresponds to 
applying the OWA operators with weights (0, 1/(N —2),..., 1/(N —2),0) 
— is rather old. Maire and Boscovich used this approach in [243] (1755) for 
measuring the degrees of a meridian. Svanberg (?) [20], in 1821, reports an 
example of applying this approach for computing the mean returns from 
some real estate properties in French provinces. 

In 1722, Cotes [85] considered the case of weighting observations ac- 
cording to the reciprocal of their errors. He gives an example (p. 22): 
“Sit p locus Objecti alicujus ex Observatione prima definitus, q, r, s ejus- 
dem Objecti loca ex Observationibus subsequentibus; fint insuper P, Q, 
R, S pondera reciproce proportionalia spatiis Evagationum, per quae fe 
diffundere possiut Errores ex Observationibus fingulis prodeuntes, quaeque 
dantur ex datis Errorum Limitibus; & ad puncta p, q, r, s posita intel- 
ligantur pondera P, Q, R, S, & inveniatur eorum gravitatis centrum Z: 
dico punctum Z fore Locum Objecti maxime probabilem, qui pro vero ejus 
loco tutissime haberi potest? ." 

Newcomb [299], in 1912, argues that rejecting observations with large 
residuals results in discontinuity. Then, he proposes using a weight w = 
eo/(eo + A), where A is the excess of the error above a certain limit 
€o. Later, in 1926, Bemporad [40] explicitly establishes the property that 
equal credibility on measures should imply symmetry in the function. On 
p. 88 he states: “Se i risultati delle singole misure presentano ugual grado 
di attendibilità, il risultato complessivo deve essere funzione simmetrica 
di ess? ." 

5. WOWA operator: Torra introduced the WOWA operator in [394] 
and [395]. An interpolation method to build W* from the weighting vec- 
tor w was considered in [402]. This method adapted the one by Chen 
and Otto in [71]. As Beliakov [37] pointed out, this method by Chen and 
Otto is equivalent to [256], but the latter is more efficient. Nevertheless, 
the adaptation in [402] makes the results of both methods different in the 
boundaries of W* (see [404]); that is, near the points (0,0) and (1, 1). 


? Let p be the precise position of a particular object according to the first obser- 
vation, and q, r, s be the positions of this same object according to subsequent 
observations; besides let P, Q, R, and S be weights inversely proportional to the 
spaces of the evagations, through which the errors that progress from each of the 
observations can be spread, and which are given from the boundaries of the errors 
given; and let weights P, Q, R, and S be understood as located at points p, q, Tr, 
and s, and let Z be at the gravity center of these points. I assert that point Z 
should be the most probable place of the object, which can be said, with complete 
certainty, as the real place of the object. 

If the results of individual measures show the same degree of reliability, the overall 
result must be a symmetric function of such results 
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The definition of WOWA using quantifiers is equivalent to the OWA, 
with importances defined in [447]. In [397], it was proved that the WOWA 
operator is a particular case of the Choquet integral. 

Liu studies some properties of the WOWA in [229], focusing on continu- 
ous WOWA. Continuous WOWA was introduced in [410]. m-dimensional 
WOWA was introduced in [291]. 


. The Choquet integral: The first definition of the Choquet integral was 


due to Vitali in 1925 [424] for additive measures. Choquet defined it in- 
dependently in 1953 [80]. The term horizontal integration is used in [216]. 
The generalized Choquet integral defined by Yager in [452] corresponds 
to the Choquet-like integral defined in Equation 6.20. Properties of the 
Choquet integral can be found in general references on fuzzy measures 
and integrals. 

Comonotonic additivity for Choquet integral was introduced by Del- 
lacherie [93], and, later, Schmeidler [348] proved the representation the- 
orem for a comonotonically additive functional. The representation the- 
orems for functionals on restricted domains are shown by Greco in [175] 
and by Narukawa, Murofushi, and Sugeno [289]. 

An extension of the domain of the Choquet integral is proposed by 
Šipoš [365]. The Choquet integral with respect to a nonmonotonic fuzzy 
measure is proposed by Murofushi, Sugeno, and Machida [284]. 


. Weighted minimum and weighted maximum: Weighted minimum 


was introduced by Yager in 1981 [441]. Previously, in 1976, Negoita and 
Flondor [295] had given a generalization of WMin. The introduction of 
weighted maximum was due to Dubois in [109] (see also [111] and [112]). 
[112] gives expressions of both WMin and WMax in terms of the median, 
and the expressions show that they are Sugeno integrals. A characteriza- 
tion of these operators can be found in [147]. OWMax and OWMin were 
introduced in [113]. Yager [443] defined later the Ordinal OWA. This cor- 
responds to a particular OWMax: when data is ordered in an increasing 
way, weights are decreasing. Ordinal OWA corresponds to the Sugeno in- 
tegral with respect to a symmetric fuzzy measure. So, it is analogous to 
the OWA operator that corresponds to the Choquet integral with respect 
to a symmetric fuzzy measure. [308] studies characterizations of these op- 
erators. 
Proposition 6.36 is proved in [14]. 


. The Sugeno integral: The Sugeno integral was introduced by Sugeno 


in 1974 [384] (see [382] (1972) for a previous definition in Japanese). The 
graphical interpretation in Figure 6.11 (a) was given in [458], and the 
interpretation in Figure 6.11 (c) was given in [427]. Properties of Sugeno 
integrals are studied in detail in [427], and detailed references are given 
there. Propositions 6.46 and 6.47 are proved in [288]. Proposition 6.48 
is proved in [279]. Example 6.39 is based on [411]. [207] proves that the 
Sugeno integral can be expressed in terms of medians. This result is not 
reported in this book. 


9. 


10. 
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Fuzzy integrals: [164] is one of the papers to consider four different 
spaces in integration, as given in Figure 6.13, 7, I*, F, and M. The t- 
conorm system for integration and the fuzzy t-conorm integral (or fuzzy 
t-integral) were introduced in [280]. Naturally, Choquet-like and Sugeno- 
like integrals also appear in that paper. The terms also appear later in [173, 
172]. 

The twofold integral was introduced in [407]. Its properties were studied 

in [292]. Its graphical representation (Figure 6.16), and its generalization 
to the continuous case (Figure 6.17) is given in [294]. Other fuzzy integrals 
and other generalizations of Choquet and Sugeno integrals also exist. The 
general fuzzy integral is one of such generalizations [41]. 
Hierarchical models: Aggregation operators that combine partial re- 
sults previously obtained by other aggregation operators have been stud- 
ied for some time. For example, the symmetrical mean is a hierarchical 
model. This operator, defined as an arithmetic mean of all permutations 
of a root-mean-powers with weights (a1,...,an) (with X` a; = 1), was 
studied by Muiheard in 1903 [275]. Its definition is 


1 a a 
Wi 22950) y: 


The operator reduces to the arithmetic mean with weights (1,0, ...,0), 
and to the geometric mean with weights (1/N,...,1/N) (see [122], p. 45). 

Bullen, Mitrinovic and Vasic review in [50] (p. 191) the mixed means (a 
combination of root-mean-powers). The oldest related result is the Carlson 
function [61] (1970). 

'The multistep Choquet integral, as defined in this book, corresponds 
to the definition in [276], which is an extension of the two-step integral 
in [265]. [265] proves that the weighted mean of Choquet integrals is a 
Choquet integral. Theorem 6.63 is given in [276] and [277]. Conditions on 
when a Choquet integral is decomposable in a hierarchical model are given 
by Fujimoto, Murofushi, and Sugeno in [155]. See also [151] for additional 
details. The first work in this direction was [281]. 

[292] studies multistep representations of Sugeno and twofold integrals 
in terms of the Choquet integral with constant. The Choquet integral with 
constant b of a function f with respect to a fuzzy measure p is defined by 
CI (f) 4- b. The Choquet integral with constant is defined in [277]. 

Other hierarchical models are also present in the literature. For exam- 
ple, Calvo, Mesiarová, and Valásková [58] present a hierarchical model 
that generalizes the twofold integral. This generalization permits them 
to construct a dual of the twofold integral. [413] proposes another hier- 
archical model called the meta-knowledge model. In this definition, par- 
tial aggregations can be used to modify the fuzzy measures embedded in 
some Choquet integrals. The term meta-knowledge comes from hierarchi- 
cal fuzzy systems. In such settings, partial inferences permit us to modify 
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the fuzzy rule base of another fuzzy system. Magdalena in [242] proposes 
such meta-knowledge in a fuzzy system. 

Other aggregation operators: For references on the median, arith- 
metic mean, weighted mean, and quasi-arithmetic mean, see Chapter 4. 
The Hurwicz operator is described in [132]. 

Operators for linguistic labels: Although in this chapter we have 
focused on operators for numerical information, some of the operators such 
as the weighted minimum/maximum and the Sugeno integral can also be 
applied to ordinal data. In Chapter 4, a few other aggregation operators 
were defined taking into account ordinal scales. Other operators exist for 
ordinal data. 

Linguistic aggregation operators encompass operators used for the ag- 
gregation of linguistic labels. Linguistic labels can be considered as en- 
riched ordinal scales, because each category in the ordinal scale might have 
associated with it some additional information. For example, in fuzzy-rule- 
based systems, each label has associated with it a fuzzy set on a given 
domain. 

[403] classifies such aggregation operators into three categories accord- 
ing to the underlying scale for the labels: (i) explicit quantitative or 
fuzzy scales, (ii) implicit numerical scale, (iii) no additional scale, with 
operators only considering the qualitative scale. The operator in [403] 
(which aggregates the labels, taking into account a semantics based on 
antonyms [95] and extended negation functions [393]) and operators that 
aggregate the fuzzy numbers belong to (i). Linguistic OWA [186], Linguis- 
tic WOWA [395], and Induced Linguistic operators [439] belong to (ii). 
Aggregation operators for ordinal scales such as weighted minimum or the 
Sugeno integral belong to (iii). Other operators that rely on operations 
on ordinal scales, e.g., t-norms and t-conorms, are also in (iii). This is 
the case for the ordinal weighted mean defined in [164] on t-norms and 
t-conorms [255]. 

The Linguistic OWA operator was based on the convex combination of 
linguistic terms proposed in [92]. It was further studied in [187]. 
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Indices and Evaluation Methods 


(...) som així, i fins i tot en els moments 
que és millor estar-se quiet, 


tenim el desfici de prendre decisions.! 


P. Calders, [54] (p. 94) 


This chapter reviews some of the existing tools for evaluating aggregation 
methods and their parameters. We focus on some indices for fuzzy measures 
(Shapley and Banzhaf), an interaction index, and the degree of disjunction. 
Other methods exist. The influence function and other tools such as gross- 
error sensitivity and local-shift sensitivity developed in robust statistics (see 
Section 2.2.6) are of interest here. The tools permit us to have some knowledge 
on how a particular estimator might behave when embedded in a real system. 
In particular, we have seen that the influence function of the arithmetic mean 
is unbounded while that of the median is bounded. 

Graphical representations are another example of such tools. For example, 
we can represent graphically the outcome of a binary operator in the [0, 1] x 
[0, 1] region. This corresponds to a 3D representation. Alternatively, we can 
consider the representation of a subset of inputs. In the case of functions 
not satisfying unanimity (such as t-norms, t-conorms, or uninorms), we can 
consider the diagonal (ie., T(x, xz) for x € [0,1]). In the case of a binary 
aggregation operator C, we might consider C(x, neg(x)) for some negation 
neg. This function permits us to visualize the compensation between x and 
neg(a). We will denote this function by C. That is, Cc(x) = C(x, neg(x)): 


Example 7.1. Let us consider the function C for some aggregation functions: 
CAM(x) = (x +1 -— x)/2 = 1/2 
e Ca (x) = z(1 E a) 


! (...) that’s the way we are, and even in the case when it is better to be still, we 
are eager to make decisions. 
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Fig. 7.1. Graphical representation of the functions Cmin, Cam(x) = 1/2, Cau (x) = 


y z(1— x), Cau (x) = 2(1— x)x 


© Cu (z) = 2(1— x)v 


Figure 7.1 illustrates the functions. It can be seen that min always returns 
the lowest value, followed by HM, GM, and, finally, AM. 


Now, we will study some of the indices that can be applied to analyze the 
parameters of the aggregation operators and their influence on the outcome 
of the aggregation method. In particular, we review some of the power indices 
(e.g., the Shapley and Banzhaf values), interaction indices, and dispersion. 
Later, we consider evaluation methods that take into account the operator. 
At that point, we consider the average value and the degree of disjunction 
(orness). 


7.1 Indices of Power: Shapley and Banzhaf Power Indices 


When fuzzy measures on X are restricted to take values in (0, 1} (i.e., 0-1 fuzzy 
measures), they can be used to model coalitions of the individuals x; € X. 
In this case, for a given set A C X, the value u(A) represents whether or 
not the set A has a winning position when making a decision; for example, 
in a given vote, whether a coincident vote for all x; in the set A ensures 
A's opinion being selected. This interpretation, born in game theory, leads to 
several indices to measure the power of a particular x; in X with respect to 
the winning positions. They are the indices of power, or power indices. 

Power indices are not only applied to 0-1 fuzzy measures, as in the above 
example, but they can also be applied to general fuzzy measures. In such 
cases, a fuzzy measure is interpreted as the value of a coalition. Then, power 
indices stand for a measure of the worth of z;, or how much p increases when 
xi is included in a coalition. 

Therefore, as fuzzy measures are used as parameters for fuzzy integrals 
(see Chapter 6), power indices are of interest in information fusion. This link 
is tightened by the fact that the application of fuzzy integrals to characteristic 
functions of a crisp set A yields the measure of A. For example, CI,,(A) = (A) 
and SI,,(A) = (A) (see Propositions 6.30 and 6.44). According to this, power 
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indices for a fuzzy measure ps give a summarization of the influence of the x; in 
the outcome of fuzzy integrals when the values to be aggregated are restricted 
to 0 or 1, or, in general, they give a trend on the output for values in [0, 1]. 

Several power indices have been defined. We review and compare some of 
them (e.g., the Shapley and Banzhaf values) in the following sections. Typi- 
cally, the indices are a function of the measure and of a particular z; in X. We 
will denote them by $;,(u), where $ is the term used for a particular index 
name. 


7.1.1 Shapley Value 


When applied to 0-1 fuzzy measures, the Shapley value is a function that, for 
each x; in X, counts the number of times that an x; changes a losing coalition 
into a winning one, i.e., the number of sets S such that, when including x; in 
the set, we have (S U {x;}) = 1 while (S) = 0. When p is not restricted to 
{0,1}, the Shapley value measures a variation on the u when zx; enters into a 
set (or coalition). 

Formally, the Shapley value is defined for arbitrary fuzzy measures, while 
for 0-1 fuzzy measures it is known as the Shapley-Shubik index. The formal 
definition of the Shapley value is given below. 


Definition 7.2. Given a fuzzy measure u, the Shapley value of u for vj, de- 
noted by s, (u), is defined as follows: 


Qu(u):z M GB U (zi) — u(S)) (7.1) 
SCX\ {xi} 


There are several expressions for the Shapley value that are equivalent to this. 
We present some of them in a proposition below. 

The first expression is based on considering all possible total orders on 
the set X. Naturally, there are |X|! such orders. We use px to denote such a 
set. Then, we need to consider the set of x; that precedes x; in a particular 
ordering r € px. We will denote such a preceding set by r;,. The example 
below illustrates px and rg, for |X| = 3. 


Example 7.9. Let X = {x1,22,23}; then, px, the set of all possible total 
orders on X, is defined as 


px = T(21,2,923), (L1, £3, 22), (22, $1, £3), (L2, £3, 21), (23, 21, 22), (L3, 22, 1)]- 


Naturally, there are |X|! = 3! = 6 different orders. Given r = (%2,21, £3), 
Te, = {£2}, r4, = 0, and rz, = {x2, 71}. 

Now, we consider the alternative expressions for the Shapley value. 
Proposition 7.4. Let u be a fuzzy measure on X with Mobius transform m, 


let N = |X|, let px be the set of orderings for X, and let px, (gu) be the Shapley 
value of u for x; in X; then, the following equalities hold: 
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1. For all x; € X, 


pail) = x Do (utra U G5) — nr). (7.2) 


where rz, denotes the set of x € X that precede x; inr. 
2. For all x; € X, 


N-1 


Pri (u) = L Z P E» (u(S U {x;}) = u(S)). (7.3) 
a 





3. For all z; € X, 


Oe ee (7.4) 


{S:2,ESCX} 


Expression 7.2 shows that the Shapley value for x; is an average of the gain 
for adding x; all possible positions in an order r. When the measure is a 0-1 
fuzzy measure, the value counts the number of times the element x; provokes 
the change from 0 to 1 with its inclusion. Expression 7.3 also considers the 
average. The average is considered on the basis of the sizes of the sets (i.e., 
from size s = 0 to s = N — 1) and then considering all sets S with size s. The 
last alternative expression is based on the Mobius transform of u instead of 
on 4 itself. 

Characterizations of the Shapley value have been built showing that it 
is the only operator that summarizes a fuzzy measure satisfying some basic 
properties. We consider below one of such characterizations. 


7.1.2 Characterization of the Shapley Value 


Given a fuzzy measure u on a set X = (z1,..., £N}, the Shapley value is a 
function of u for each x; in X. Accordingly, Shapley values can be seen as a 
vector y = (yı (u), ---, Yn (H)). The characterization is based on the following 
concepts: symmetry, efficiency, and additivity. We detail the concepts below. 


Symmetry: The names of the elements x; do not play any relevant role in the 
computation of y;(44). This property is stated in terms of permutations 
on X: Let m be a permutation of X (ie. a(x) € X and r(x) Z m(y) 
if and only if x Z y); then, ur is defined as u4(A) = u(1(A)), where 
z(A) = Usea{a(a)}. 
Then, symmetry is fulfilled when, for all permutations 7 on X, 


Yu (UL) = Pr(a) (Us). 


Efficiency or carrier axiom: The value is a summarization of the whole 
fuzzy measure. That is, not only is the individual value p((z;)) taken 
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S 


Fig. 7.2. Carrier axiom: subset C is the carrier. In this case, u(S) = u(C N S) 


into account to compute Yx; (ju), but also the other values to which the zx; 
contribute. This is formalized in terms of the carrier concept. 
A subset C of X is a carrier of p if 


u(S) = u(C n S) 


for all S C X. Therefore, elements z € X that are outside any carrier have 
no influence on the measure (they contribute nothing to any coalition). 
That is, only the filled region in Figure 7.2 is relevant for computing u(S). 

'The efficiency axiom is formalized using the carrier concept as follows. 
For any carrier C in X, 


XO es(u) = uC). 
zicC 


Additivity: When two independent fuzzy measures are combined, their val- 
ues must be added element by element. This is formalized as follows. For 
any two fuzzy measures [4 and u2, for all x; in X, 


Qs; (mı) + Px; (u2) E Px; (ua 32 H2), 


where the fuzzy measure pı + fig is defined as u(A) = mı (A) + u2(A) for 
all AC X. 


Now, we can formulate the characterization of the Shapley value as follows. 


Theorem 7.5. A unique value function p exists satisfying symmetry, effi- 
ciency, and additivity. It is the Shapley value. 


That is, the Shapley value is the only index that satisfies the three conditions 
considered above. 


7.1.3 Banzhaf Value 


The application of the Shapley value shows that in some circumstances there 
are sets that are counted twice. The following example illustrates this situa- 
tion. 
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Example 7.6. Let u be a 0-1 fuzzy measure on X = {21, £2, £3} such that 
I ((21,22)) = 0 and u((z1, £2, £3}) = 1. Then, using Expression 7.2, we can 
observe that the Shapley value for x3 counts twice the fact that r3 changes 
I (21, 22)), equal to 0 to u((z1, £2, £3}) equal to 1. This is so because both 
orderings, rq = (%1,%2,%3) and rg = (£2, %1, £3), will be considered when 
computing q.(4). 


'The Banzhaf value, another power index, permits us to consider only once 
the influence of a source x; in such a situation. This is achieved considering 
the pairs S and S \ (x;). Note that, for the particular set Sı = {21, x2, x3} 
in the example above, only Sı and Sj \ {x3} = (x1, 22) would be considered. 
We define below two variants of the Banzhaf index, the unnormalized or non- 
standardized Banzhaf index, and the normalized one. Both indices rely on the 
concept of an x in X being essential in a set S or, equivalently, in the concept 
of the swing voter. 


Definition 7.7. Let u be a 0-1 fuzzy measure on X; then, for any S C X, it 
is said that x is an essential member in S or, equivalently, that x is a swing 
voter if removing x from S changes the measure from 1 to 0. In other words, 


x is essential if u(S) — u(S N (x;]) = 1. 


Equivalently, in the context of game theory and coalitions, x is a swing voter 
when the measure moves from a winning situation to a losing one. 

Both normalized and unnormalized Banzhaf indices for a source x count 
the number of sets in which x is essential. Differences are due to the way the 
proportion is computed. The unnormalized index divides the count by the 
total number of coalitions in which x; is a member. The normalized Banzhaf 
index divides the count by the total number of distinct sets in which a member 
is essential. Both definitions are given below. 


Definition 7.8. Let u be a fuzzy measure on X; then, 


1. The unnormalized (or nonstandardized or absolute) Banzhaf index of ju 
for x; is defined by 


[m S LÍ S Ti 
ou) = Ese 09 Ve) 


2. The Penrose index (or normalized Banzhaf index or relative Banzhaf in- 
dex) of u for x; is defined by 
Bots X scx (u(S) — pS \ tu 
On X Xsex (a(S) = (8 (2:9) 


'The normalized Banzhaf index can be expressed in terms of the unnor- 
malized one as follows: ; 
(u) 


Bs, (u) = m B . 
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7.1.4 Properties 


A probability model for interpreting the power indices has been defined con- 
sidering winning coalitions. The main idea is to consider the individual effect. 
This effect is defined as follows. 

Question of individual effect: What is the probability that my vote will affect the 
outcome of the vote on a bill? In other words, what is the probability that a bill will 
pass if I vote for it, but fail if I vote against it ? (Straffin, [380]) 

Models about individual effects can be built by considering the probability 
with which each x; in X either votes for the bill or against it. Let p; the 
probability of x, voting for the bill; then, for X = (zx1,..., zw], we have a 
probability vector (pi,..., pw). Different assumptions can be considered with 
respect to the values p. In particular, the following two seem natural in this 
framework. 


Assumption 1 (Homogeneity assumption) A number p is chosen from 
the uniform distribution on [0,1], and py = p for all k. 


Assumption 2 (Independence assumption) Each py is chosen indepen- 
dently from the uniform distribution on [0,1]. 


Then, the following two theorems can be proved. 


Theorem 7.9. The individual effect of x; under the homogeneity assumption 
is given by the Shapley value for xi: Px; 


Theorem 7.10. The individual effect of x; under the independence assump- 
tion is given by the (unnormalized) Banzhaf value for xj: B! 


Ti^ 


According to these theorems, the Shapley and Banzhaf indices for 0-1 fuzzy 
measures can be interpreted in terms of a probabilistic model. In this model, 
differences stand for the voting probabilities pj for each voter xp. 


7.2 Interaction 


Both Shapley and Banzhaf indices are computed for individual sources z € X. 
Interaction indices have been developed to measure to what extent two or 
more elements interact in a given measure. These new indices can be seen 
as a generalization of the former, where interaction, understood as either a 
complementarity or a redundacy, is measured. The definition of the interaction 
index is based on the S-derivative. Such a derivative is defined as follows. 


Definition 7.11. Let u be a fuzzy measure; then, given S, T C X, Ag|u(T)] 
is the S-derivative of u at T, and it is recursively defined as 
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As[u(T)] = uT) - uT \ {2}) 
As[u(T)] = A« [Asus] for a € S, 


where A,z|Ags(T)] is defined as 
A,[As [T] = As[n(T U SU {x})] — As[p((T U S) V (a )]. 


In fact, the equation for computing Ags|[u(T)]| holds for all x € S, as the order 
of selecting x in S is not relevant in the computation of the derivative. 

Induction on the cardinality of the set S permits us to obtain a nonrecur- 
sive expression for the S-derivative. This expression is 


As[u(T)] = Y; (7125 -!Flu(T U K) 
KCS 


for all T C XS. For all $ that are subsets of a carrier C (i.e., u(S) = u(CN S) 
for all S C X), and for all T C XX S, we have As[u(T)] = 0. 

Now, we define the interaction index for two sources x; and x; in a carrier 
C of X as follows: 


(IN| = Ir] - 2)! fT] 


miron = — M] (N-I 


TCON(zi,a;) 


'This is generalized for an arbitrary number of sources as follows: 


1,8)= D STET SERT spur) 
TCOAS 


This index is an extension of the Shapley value, as J(u, (2;]) is equivalent 
to the Shapley value of x; in p for all u and all z; € X. That is, (u, {a;}) = 
Px; (14) (where yz, (u) follows Definition 7.2). 


7.3 Dispersion 


In most aggregation operators (all other indices being equal), it is often con- 
sidered inappropriate to accumulate the weights or importances into a single 
source. Instead, the weights are distributed among the sources to maximize 
dispersion. This is done to reduce the influence of a particular source (see 
Section 2.2.6 about the use of influence functions for this purpose). 

Entropy is a measure that can be used for evaluating the dispersion. As 
a weighting vector p is equivalent to a probability distribution, the standard 
definition of entropy is appropriate. 
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Definition 7.12. Let p = (pi,..., pw) be a weighting vector; then, its entropy 
(dispersion) E is defined as 


N 
E(p) := — >> pi log pi, 
i=1 


with Olog0 defined as 0 (to allow for zero weights). 


Alternatively, with the function h, 


h(a) := 


T ifr>0 (7.7) 


0 if x = 0, 


the entropy can be defined as follows. 


Definition 7.13. Let p = (pi,..., pw) be a weighting vector; then, its entropy 
(dispersion) E is defined as 


N 


E(p) := » / h(p;). 


i=l 


For positive weights adding 1 (weighting vectors following Definition 6.1), 
the expression E is maximal when all the weights are equal (i.e., p; = 1/N). 
The maximal value obtained is E(p) — log N. In contrast, the minimal value 
E(p) = 0 is obtained when p; = 1 for one i. 

A concept related to dispersion is variability. The variability is defined as 
the variance of the weights. 


Definition 7.14. Let p = (p1,...,pn) be a weighting vector; then, its vari- 
ance (variability) is defined as 


c?^(p) := E[(p — E[p])7]. 


So, the variability can be computed as follows: 


7.3.1 Entropy for Fuzzy Measures 


There exist two alternative definitions for entropy concerning fuzzy measures. 
They are the lower and upper entropies. We define them below using the 
function h given above. 
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Definition 7.15. Let u be a fuzzy measure on X = {x1,...,uNn}; then, the 
lower entropy E; of u is defined by 


N 
E(u) =) $. manu tz) - uT), (7.8) 


t=1TCX\ {xi} 
where h is defined as in Equation 7.7, and 


(n — t— 1)! 
n! : 


y(n) = (7.9) 
Definition 7.16. Let u be a fuzzy measure on X = {x1,...,uNn}; then, the 
upper entropy Ey, of u is defined by 


E(up): X h| SS) yr (NLU (2) - aT) | ; (7.10) 


i—1 — NTCXN(i) 
with y(n) defined as above. 


In this second definition, the entropy of a fuzzy measure u corresponds to the 
entropy of the Shapley value (see Section 7.1.1) of the measure p. 
'The following properties have been proved for the entropies. 


Proposition 7.17. Let E; and E, be defined as above; then, the following 
hold 


1. E(u) and E,(u) are symmetric with respect to the permutation of the 
sources (the permutation of a measure follows the definition in Sec- 
tion 7.1.2): 


Ei(uz)-— Ep) and Ey (ur) = Eu(u) 
for all permutation m. 
E(u) € Eu(u) for all p. 
E(u) = Eu (u) if and only if u is additive. 
4. When u is additive, if u is inferred from p, then, Ej(u) = E,(p) = E(p). 
This implies that the entropy of a fuzzy measure reduces to the entropy of 
a weighting vector, and is consistent with the fact that the Choquet integral 
with respect to u corresponds to the weighted mean with respect to p. 
Given a symmetric fuzzy measure generated from a weighting vector w 
(using Proposition 6.25), its lower entropy is 


N 


go £N 


a 


and its upper entropy is 
log N. 


This roughly corresponds to the lower and upper entropies of the OWA 
operator with weights w. 
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7.4 Average Values 


A simple index for any aggregation operator is its average value, i.e., the 
integral for all possible inputs. 


Definition 7.18. Let C be an aggregation operator in [0, 1|". with parameter 
P; then, the average value of Cp is defined as 


1 1 
AV (Cp) =| ri Cp(a1,...,an) dai ...day. 
0 0 


We give below the average value for some aggregation operators. 


Proposition 7.19. The average value for the minimum (min), the maximum 
(max), and the arithmetic mean (AM ) is as follows: 


e AV(min) = N/(N +1) 
e AV(max) =1/(N +1) 
AV(AM) = 1/2 


7.5 Orness or the Degree of Disjunction 


Aggregation operators (as explained in Section 1.1, Equation 1.1) yield values 
between the minimum and the maximum. However, their behavior with re- 
spect to minimum and maximum is not the same for all of them. While some 
operators always yield values near the minimum, others yield values near the 
maximum; and some yield values that can be near either the minimum or 
the maximum (depending on the operator parameterization). Due to this, 
one way to evaluate the behavior of an operator is to measure its similarity 
to the maximum (or minimum) operator. To compute the similarity between 
operators, the average value introduced in Definition 7.18 can be used. 

The index that computes the similarity with the maximum is the degree 
of disjunction or orness. This name is after the use in fuzzy logic of the 
maximum to model disjunction or the “or” connective (see Section 2.3.1). In 
an analogous way, the similarity with the minimum corresponds to the degree 
of conjunction or andness, as in fuzzy logic the minimum is used to model 
conjunction or the “and” connective. 

We now define the degree of disjunction or orness. 


Definition 7.20. Let C be an aggregation operator with parameters P; then, 
the orness of Cp is defined by 


orness(C p) := AVE ae) (7.11) 


— AV (max) — AV (min) ` 
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The degree of conjunction or andness can be defined either with an ex- 
pression measuring similarity with maximum or, equivalently, in terms of the 
orness. 


Definition 7.21. The andness of Cp is defined by 


AV — AV(C 
andness(Cp) := 1 — orness(Cp) = ACT 


Naturally, these definitions are such that the following equations hold: 


Proposition 7.22. Orness and andness satisfy the following properties: 
orness (max) — 1 | andness (max) = 0 
orness (min) 2 0 — andness (min) = 1 


orness (AM) — 1/2 andness (AM) — 1/2 


Simplified expressions for the orness can be found for particular aggrega- 
tion operators. In the next proposition we give the corresponding expressions 
for the OWA operator (see Definition 6.4) and the Choquet integral (see Def- 
inition 6.17). 


Proposition 7.23. Let orness be defined as in Definition 7.20; then, the fol- 
lowing equations hold: 


orness(W My) = 1/2 


N 
orness(G@Mp=(p1,...pw)) = Nir(wzi) — Ww 
orness(OW Aw) = x N (N — ijwi 
orness( H Mp=(p1 ,p2)) = 0.2274 and orness(H Mp=(p1,p2,p3)) = 0.2257 





N-|A 
orness(CI,) = x X acx m), 
where m is the Mobius transform of u (see Definition 5.14). 


Here, GM is the Geometric Mean and HM is the harmonic mean (see Sec- 
tion 4.2). It can be proved that 


orness(G M )g—(p, ,..., py) « orness(GM)p-(p, py 1i) 


sri 


For example, for N — 2, orness(GM) — 1/3 — 0.3333, and for N — 3 
orness(GM) — 11/32 — 0.3437. 
The following property can be proved for the orness of the OWA operator. 


Proposition 7.24. Given a weighting vector w and a weighting vector w' 


satisfying wi, = wy—i+1 for all i, 


orness(OW Aw) = 1—orness(OW A). 
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7.5.1 Orness for Fuzzy Quantifiers 


The orness definition given above for the OWA weighting vectors can yield 
different values when applied to weighting vectors extracted from the same 
quantifier but with different dimensions. That is, the orness of the OWA op- 
erator depends on the dimension when weights are extracted from the same 
quantifier. To make comparisons among quantifiers easier, and to make their 
evaluation possible, the orness of a quantifier has been defined. The measure 
is defined in such a way that, when weighting vectors of increasing dimension 
are extracted, their orness tends to be the orness of the quantifier. So, the 
orness of weighting vectors approximates the orness of the quantifier. 

The definition for the quantifier is based on a rewriting of the expression 
of the orness for the OWA in Proposition 7.23. Note that 














orness(w) = Seb 
Ep 
N-1 —2 N—-N 
= Wa qus ep graue Arq MUN 
N-1 
= Y Q/N - 1)QG/N), (7.12) 
i=1 


where Q is the quantifier associated with weights w. That is, Q is the quantifier 
that has been used to extract the w; or, alternatively, the quantifier that 
interpolates the points {(¢/n, 5 j<; wj)]. 

To generalize the orness measure, we consider the following expression 
instead of the previous ones: 


N 
24 (1/N)QG/N). (7.13) 


While it is clear that a GENS and xui 1)Q(i/N) tend to 
be equal for large values of N, the latter expression can be easily generalized 
by means of an integral. We will refer to this generalization as the continuous 
orness. Its definition is as follows. 


Definition 7.25. Given a fuzzy quantifier Q, the continuous orness measure 


for Q is defined as i 
orness(Q) =| Q(x) dz. (7.14) 
0 


This expression permits to study the orness for some families of quantifiers 
without focusing on particular dimensions. Additionally, it permits us to vi- 
sualize that all fuzzy quantifiers with the same area in [0,1] are equivalent 
with respect to orness. We consider some families of fuzzy quantifiers below, 
giving analytical expressions for their orness. 
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Fig. 7.3. a-trimmed fuzzy quantifiers 


à |cont.|N=5 N —10 N — 20 
orness|Yager's Yager's Yager's 
—0.99]0.7930] 0.8473 0.8213 0.8074 
—0.90]0.6768] 0.7114 0.6943 0.6856 
—0.50]0.5573] 0.5687 0.5630 0.5602 
0.00 : 0.5 0.5 0.5 
0.50 |0.4663] 0.4596 0.4629 0.4646 
1.00 |0.4427| 0.4312 0.4370 0.4398 
1000 |0.1437] 0.0826 0.1105 0.1265 





Table 7.1. Comparison of orness measures 


Example 7.26. The a-trimmed quantifier Q* is defined as 


0 ifr<a 
Qi(z):=< x ifa Xz«1l—ao (7.15) 
1 if~>l-a 


for a < 0.5. 


Note that the orness of these fuzzy quantifiers equals 0.5, and that the 
OWA operator with these fuzzy quantifiers corresponds to the a-trimmed 
mean. Figure 7.3 represents some quantifiers of this family. 

We now consider two other families of fuzzy quantifiers: 


Definition 7.27. Let Sugeno A-quantifiers and Yager a-quantifiers be defined 
as follows. 


Sugeno A-quantifier: for À > —1, when A = 0, Q)(x) = x and when à 40, 
Qx(a) = (e7 PUFA) on 1)/A. 
Yager a-quantifier: for a > 0, 


Qa(x) = 2%. 


7.5 Orness or the Degree of Disjunction 211 





^ T T : 1/(a+1) — 


T log 1) 21/A — 


0.8 


0.6 


04 


0.2 

















100 


Fig. 7.4. Orness of the Sugeno A-quantifiers for values of A € (—1, 100] (left) and 
of the Yager a-quantifiers for values of a € (0, 100] (right) 


Q» is called a Sugeno A-quantifier because Q» generates a distorted probability 
with any probability distribution p. That is, y = Qxop is a Sugeno A-measure 
for any probability distribution p and A > 1. Q corresponds to the distortion 
function in Corollary 5.68. 


Proposition 7.28. The orness of the Sugeno A-quantifier is 


1 T 


orness(Q) = ind) X 


When à = 0, orness(Qx) = 1/2, which corresponds to the left and right limit. 


The orness of the Yager a-quantifier is 


1 
atl’ 





orness(Qa) = 


Table 7.1 gives the orness for several Sugeno A-quantifiers. The table in- 
cludes the continuous orness and the orness for dimensions N = 5, 10, and 
20. It can be observed the convergence of the measure when N increases with 
respect to the continuous orness. 

A graphical representation of the orness for Q3, with A € (—1, 100), is 
given in Figure 7.4 (left). This figure shows that, for large values of A, we need 
a large variation in the parameter to obtain relevant changes in the orness. 
Figure 7.4 (right) represents the orness for Qa quantifiers for a € (0,100). In 
this case, the orness moves rapidly from 1 to 0. 

An important result of the two fuzzy quantifiers is the following. 


Proposition 7.29. The orness of the Sugeno A-quantifier (Qx(x)) and of the 
Yager a-quantifier (Qa(x)) are strictly monotone decreasing functions with 
respect to the parameters A and a. 


'This proposition will be used in Chapter 8.2, as it is useful for determining 
the weights in the OWA operator. 
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Fig. 7.5. Two families of uninorms 


7.5.2 Pointwise Orness: Orness Distribution Function 


The consideration that the orness is a pointwise property instead of a global 
one yields to the definition of the orness distribution function. This is moti- 
vated by the fact that an aggregation operator can be defined in such a way 
that in some subdomains it behaves like an and operator, while in other sub- 
domains it behaves like an or operator. Then, the orness is defined for each 
point in [0, 1] as the similarity between the outcome of the operator and the 
maximum. In this way, an orness distribution is defined for all [0,1], and 
then, if needed, average values can be computed to summarize the informa- 
tion. 

Uninorms are examples of operators that do not have such a uniform 
behavior on the entire domain. As we saw in Section 4.1.1, uninorms have 
a conjunctive region and a disjunctive region. For example, in the case of 
Figure 4.1 (reproduced here in Figure 7.5), we have the region with a t-norm, 
the region with a t-conorm, and the regions with minimum or maximum. So, 
there are regions with a low orness (the regions with a t-norm) and regions 
with a high orness (the regions with a t-conorm). 

This pointwise orness definition resembles the previous expression (see Ex- 
pression 7.11). The main difference is that now no average value is computed, 
but the orness distribution function is defined on the space of inputs. 


Definition 7.30. Let C be an aggregation operator with parameters P; then, 
the orness distribution function of Cp is defined by 
C LE 
Micha (7.16) 


max(a) — min(a) 
for alla Z (a’,...,a’). 


Definition 7.30 is properly defined, as Expression 7.16 can be properly 
computed (i.e., it does not diverge to infinity) when the input vector a tends 
to a situation in which all the inputs are equal. 

Now, we can consider the computation of some indices for this distribution 
so that its information is summarized. We will consider an orness average 
value. 
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Definition 7.31. Let C be an aggregation operator with parameters P, and let 
orness(Cp) be its orness distribution function; then, its orness average value 
is defined by 


1 1 
ac») = f uj odf (Cp, (a1,...,an))da,...day. 


Simplified expressions are known for some aggregation operators. 


Proposition 7.32. The following relations hold: 


. odf (W Mp) = 0.5 

; a ) = orness(OW Ay) 
- odf(UN.) = (1 — e)" 
odf(UN*) 2 1— e^ 

: odf (GM) — = 0.385 (for N — 2) 

_ odf (HM) = 0.306 (for N = 2) 


HIE 


Here, U N, and U N* correspond to the uninorms defined in Section 4.1.1, and 
GM and H M correspond, respectively, to the geometric and harmonic means 
(Section 4.2). 

Although this proposition shows that the orness average value of the OWA 
operators are equivalent to the orness of the OWA operators, this is not the 
usual case. Note that equality does not hold for HM and GM (see Proposi- 
tion 7.23). 

Note also that equivalence between the orness and the average value of the 
orness distribution corresponds to the equivalence between 


1 1 
Ce) = | ou odf (Cp, (a1,...,an))da,...day = 
(a) — min(a) 
S fs max(a) naam) ire dum 


AV (Cp) — AV (min) 
AV (max) — AV (min) 


and 


orness(Cp) — 


7.5.3 Interpretation 


An important point of the orness measure is that it can be interpreted be- 
yond its definition in terms of maximum and minimum. Some of the existing 
interpretations are given below. Interpretations are rooted in some examples. 


Similarity to maximum: The simplest interpretation is that orness corre- 
sponds to the similarity to the maximum operator. In this case, similarity 
can be computed from a global perspective (as in Definition 7.20) or as 
the average of a pointwise property (as in Definitions 7.31). 
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Compensation: In this case, orness measures to what extent a bad score (a 
small value) for one of the inputs influences the output. When compensa- 
tion is allowed (orness is maximum and equal to one), any good value (a 
large value) overcomes any bad one. That is, the maximum is selected. In 
contrast, when no compensation is allowed (orness is equal to zero) bad 
values (the smaller ones) overcome all good values, no matter how many 
good values there are. 

Optimism: Under this interpretation, the orness corresponds to a measure 
of the optimism of a decision maker, while the andness corresponds to 
pessimism. With regard to the aggregation of utilities, the larger the or- 
ness, the larger the risk the decision maker is accepting (the aggregated 
utility will be larger for the same inputs with larger ornesses). In contrast, 
when the orness is small, the decision maker is giving more importance 
to small utilities, and, thus, is more concerned about risk. Therefore, the 
decision maker, with regard to risk, is more pessimistic (less optimistic). 

Fuzzy quantifier: The orness is interpreted in terms of the orness of a fuzzy 
quantifier. It is known that the “for all” quantifier has an orness equal to 
0. When this quantifier is applied, all criteria are required to be satisfied. 
This corresponds to a conjunction of all criteria. In contrast, the orness of 
the “there exists” is equal to 1. In this case, only the satisfaction of one 
of the criteria is required. Therefore, this is disjunctive behavior. Other 
quantifiers can also be used, and the corresponding orness is interpreted 
in terms of such quantifiers (e.g., about 50%, a third). 


7.6 Bibliographical Notes 


1. Power indices: Power indices have been developed in game theory, the 
von Neumann and Morgenstern book [425] being the preeminent reference 
in the area. For essays on the history of game theory, see [433]. See [135, 
136] for an outline on the history of indices. For a recent book on game 
theory, see [285]. 

The Shapley value was first defined in [357] (in 1953), and then special- 
ized into the Shapley-Shubik power index in [359] (1954). The Banzhaf 
index was introduced in [31] (in 1965), and its standardized definition was 
proposed by Dubey and Shapley in [108] (in 1979) as a modification of 
the standardized one. Nevertheless, in 1946, L. S. Penrose had already 
published the index in [319] (which was introduced into mainstream re- 
search by Morriss in 1987 [274]). The Banzhaf index was characterized by 
Dubey (see [107, 108]). [27] studies indices for non-atomic games. That 
is, N-player games for non finite N, or, equivalently, fuzzy measures over 
a non finite set. Appendix A in [27] reviews finite games and their val- 
ues (the Shapley value for fuzzy measures). For a detailed account of the 
Shapley value and related issues, see the book edited by Roth [336]. [357] 
and [359] are reprinted there. 
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At present, a large number of power indices have been defined in the 
literature of game theory, e.g., the Deegan-Packel index [90], the Coleman 
index [81] (reprinted in [82]), and the Executive power index (by Colomer 
and Martinez [83, 84]). While these indices only rely on the measure/game 
jt, other indices have been defined in game theory that consider some ad- 
ditional information (e.g., whether the individuals in a set are connected). 
This is the case with the Garret and Tsebelis index [159, 160, 161]. See [19] 
for a review of the indices. Felsenthal and Machover (1998) [135] give a 
critical perspective of the area (see also Morris, 1987 [274]). 

Similarities and relationships between the indices have also been stud- 
ied, and paradoxes have been analyzed. For example, [44] and [45] proved 
that in some circumstances the indices lead to unsatisfactory results. Straf- 
fin in [379] and [380] compares the Shapley-Shubik and Banzhaf power in- 
dices. He gives an example showing that the two indices can yield different 
results, and even rank the power of the voters differently. Straffin argues 
that the comparison of the two indices in terms of the order in which 
winning coalitions are formed is misleading. He introduces the interpreta- 
tion based on probability models and the homogeneity and independence 
assumptions. So, the order plays no part in the interpretations. He also 
studies alternative situations, such as partial homogeneity assumptions, 
which can model the situation where voters can be partitioned into groups 
(e.g., parties), and there is homogeneity between two members of the same 
group but not between two members of two different groups. Examples of 
paradoxes of the indices can be found in Felsenthal and Machover [135]. 

For Web resources, see [311] (it includes software for computing power 
indices) and [310]. 

The Shapley value for fuzzy coalitions has been studied by several re- 
searchers. See [53] and [420] 

. Interaction index: The interaction index for sets of two elements was 
introduced by Owen (1972) [309] (Section 5), and later rediscovered by 
Murofushi and Soneda [278]. The extension for an arbitrary number of 
sources is due to Grabisch [169]. The interaction index presented here 
generalizes the Shapley value. Accordingly, it can be cited as the Shapley 
interaction index. Other interaction indices are also possible. For example, 
Roubens in [337] introduced the Banzhaf interaction index that extends 
the Banzhaf value. Some characterizations of interaction indices are given 
in [152, 153, 154]. 

. Average value, degree of conjunction, degree of disjunction or 
orness: The average values for minimum and maximum were given by 
Dujmovié in [101]. Orness and andness concepts were introduced by Du- 
jmovié in 1973 and 1974. The orness distribution function (called local 
orness) and the orness average value (called mean local orness) were in- 
troduced in [100]. Then, the orness (as defined in Definition 7.20) was 
introduced in [102]. [102] gives, among other results, the orness of the 
geometric mean. Numerical computations for some root-mean-powers are 
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also included. [103] includes a few additional results. The orness for the 
quasi-arithmetic mean is studied in [230]. [230] presents an alternative def- 
inition for measuring orness that uses the generator of the quasi-arithmetic 
mean. 

Independently, Yager defined orness and andness in [442] for the OWA 
operator. Yager and Rybalov defined in 1996 the orness and andness of 
uninorms UN, and U N* in [457]. Marichal, 1998, gives in [245] the general 
definition of the degree of disjunction and finds an expression (in Propo- 
sition 7.23) for the orness of the Choquet integral. Marichal also proves 
the consistency of Yager’s expression for the OWA orness with Dujmovié’s 
degree of disjunction. Before Marichal, in 1994, Fodor and Roubens [146) 
had already drawn attention to Dujmovic’s work. More recently Fernández 
Salido and Murakami [138] reintroduced the orness distribution functions 
and the orness average value. We use their terminology in this book. 

The orness for fuzzy quantifiers was first defined by Yager in [449]. 
Torra [405] uses this expression to study the orness of Sugeno A-quantifiers. 
[59] introduced an alternative expression for the orness of quantifiers that 
was proved to be equivalent to Yager's in [453]. See also [57]. The equiv- 
alent expression is: 


1 1 
orness(Q) — 1 -f zQ'(r)dr = 1 — | zq(z)dz, 
0 0 


where Q is the fuzzy quantifier and q’ its generating function. 
[443] defines an ordinal OWA and a nonnumerical orness (orness on an 
ordinal scale). [133] discusses this ordinal orness. 


. Orness interpretation: Orness as compensation is considered in [396]. 


A similar concept, used in [245], is orness as a degree of tolerance. [301] 
and [444] interpret orness as optimism and andness as pessimism, which 
is used to define the pessimism-optimism index criterion of Hurwicz. See 
(131, 132] or Luce and Raiffa [239] for a description of this criteria. [22] 
gives an interpretation of the orness in terms of the number of disjunctions 
in a statement. Dujmovié [102] and Fernandez-Salido and Murakami [138] 
use dissimilarity with the minimum as orness. [138] and [245] review some 
interpretations of orness. 


. Entropy and dispersion: For the definition of entropy and its proper- 


ties, see [356] and [25]. Some characterizations of these measures can be 
found in [10] and [114]. The use of entropy [356] for evaluating the dis- 
persion of the weights appears, among others, in [301, 302, 60, 442, 400]. 
The first use with OWA weights was in Yager's 1988 paper [442]. In these 
works, entropy is used for measuring dispersion of a weighting vector ei- 
ther for the weighted mean or for the OWA operator. 

For fuzzy measures, original definitions for entropy were given by Yager 
in [448] and Marichal in [245] (see also [248]). Yager's definition corre- 
sponds to the upper entropy, and Marichal's corresponds to the lower en- 
tropy. The lower and upper probabilities are due to Marichal and Roubens. 
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In [254], they compare the two definitions and prove, among other things, 
that the upper entropy always leads to a value larger than the lower one 
(and that equality is obtained only for additive fuzzy measures). The 
names for the upper and lower entropies naturally follows from this prop- 
erty. 

Entropy for fuzzy measures as defined in this chapter assumes values 
in [0,1]. Marichal and Roubens defined, also in [254], entropy suitable for 
ordinal fuzzy measures. The authors affirm that this definition is linked 
to the Sugeno integral (as both the entropy and the integral are specially 
suited for working in ordinal scales). 

Let u be defined in L = {lo,...,l,}, with lo <z lı <L +++ <z lr; then, 
the ordinal entropy Ez, of u is defined by Er(u) = liri — 2, where R = 
[u(A)|A C X). The rationale for this definition is that the entropy Ez (u) 
measures the diversity of the coefficients used in the fuzzy measure u. Note 
that the measure is based on the number of different terms used in p, less 
two. Among the properties of Er, we have that its minimum is lọ and its 
maximum is l,._1 (or lov. , if 2N < r). Moreover, the entropy for WMax, 
WMin, OWMax, and OWMin using a weighting vector u = (u1,..., UN) 
is Er(u) :— llfu,..un}|-2 (again, the number of different terms now in 
u). 
The use of variance to measure variability can be found in [446], but 
the concept is slightly different. Fullér and Majlender [157] use the same 
approach as defined here. 


8 
Selection of the Model 


The moment of truth is a running program. 


H. A. Simon, [362] (p. 96) 


When an application needs a fusion mechanism, the developer has to solve an 
essential problem: the construction of the appropriate model. T'his corresponds 
to (1i) the selection of an aggregation operator and (ii) the determination of 
its parameters. This process should take into account several factors. Some of 
them are highlighted here. 


Mathematical properties: The operator should be selected taking into ac- 
count the desired properties of the model. Characterizations can help in 
the selection. Indices and any other behavioral analysis, e.g., orness or 
breakdown points (if we are interested in robust behavior), can also be 
useful. 

Interpretability: An expert or user needs to understand the model. As the 
model consists of an operator and its parameters, interpretability is appli- 
cable to both operators and parameters. In the case of complex operators 
such as fuzzy integrals, tools for analyzing the parameters (as with power 
indices) can improve the interpretability of the model. 

Adaptability: The environment of any system changes with respect to time. 
Selection should consider whether the operator will still be valid after 
some time of operation (say, with a readjustment of its parameters). 


An additional factor to be taken into account is simplicity. We have de- 
scribed in previous chapters a broad variety of operators. The most simple 
ones have a limited descriptive capability (e.g., the arithmetic mean), while 
the more complex ones (e.g., the Sugeno integral) have additional capabili- 
ties. Such additional capabilities come with the price of more parameters (e.g., 
2/NI — 2 values for any fuzzy integral). 

In general, other factors being equal, the simpler the model, the better. Too 
simple operators might produce unsuccessful results, whatever the parameter 
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used. However, unnecessarily complex operators might cause overfitting to a 
particular situation or to a particular data. 

In the rest of this chapter we will consider the problem of parameter deter- 
mination. At present, there exist several methods for this purpose, and they 
can be roughly classified into the following two classes. 


Methods based on an expert: Two main alternatives exist in which the help 
of an expert is fundamental for fixing the parameter. One alternative is 
that the expert (almost) directly supply the required parameters. The 
Analytic Hierarchy Process is an example of this approach. The method 
permits us to elicit the weights for operators like the weighted mean. 
In short, an expert is interviewed, and from pairwise comparison of the 
sources the weights are calculated. T'his method permits inconsistencies to 
some extent. Another alternative is that the expert supply some relevant 
information that is later used for parameter determination. This is the 
case of parameter determination from orness or compensation. 

Methods based on data: Parameters are learned from a set of examples. One 
alternative consists of having preferences or a partial order of the exam- 
ples (or of the outcomes of the aggregation); for example, we may prefer 
example 1 to example 5, or the outcome of example 1 may be larger than 
that of example 5. The order is not required to be a total order. Another 
alternative consists of having examples where each is defined in terms of 
the inputs of the model as well as the intended output for the inputs. 


In this chapter we will review methods of the two classes. First, we will 
consider the Analytic Hierarchy Process. Then, we will describe the method 
to determine OWA weights from orness (or compensation) and dispersion. 
Finally, we will review some methods to determine parameters from examples. 


8.1 Analytic Hierarchy Process 


The Analytic Hierarchy Process (AHP) was designed to derive ratio scales. 
Therefore, AHP is a valid methodology for weight determination for operators 
with weights represented in ratio scales. 

Weighted means and OWA operators are an example of such operators. 
In their weighting vectors, the relevant aspect is the relationship (the ratio) 
between weights; that is, whether one weight is, for example, two times or 
three times another weight. This is equivalent to expressing that one infor- 
mation source (sensor or expert) is two times or three times more relevant or 
important than another. Besides, there is an absolute zero for weights. Note 
also that the exact value for the weight is not much relevant for expressing 
importance, and that any transformation that keeps the ratio would be, in 
principle, valid. The requirement of all weights adding 1 permits unanimity 
to hold. The difference between weights w adding 1 and weights w^ adding 
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Importance Meaning 
1 Equal importance 
3 Moderate importance 


Essential or strong importance 
Very strong importance 
Extreme importance 


ON ot 


Table 8.1. Scale for intensity of importances 


r is a multiplication factor of r in the outcome. All these properties imply a 
ratio scale. 

In a given problem, the first step in the AHP is to formalize the structure 
of the objects under consideration, establishing their relationships. Typically, 
such a structure corresponds to a hierarchy or a network. In the case of sim- 
ple weight determination, we will only consider a single set X = (21,..., £N} 
consisting of the information sources. So, no hierarchical structure or network 
is needed. More complex situations arise when AHP is applied in a multicri- 
teria decision making problem. In that case, we have several criteria as well 
as several alternatives. Both, the criteria and alternatives are used to define 
a network. 

The next step is to compare each pair of objects. That is, in our case, to 
compare the pairs of sources with respect to their importance. So, for each 
pair xi, x; in X, we assign to it a value representing whether the importance of 
x; is larger than, equal to or lower than that of xj. Let a;; be this value. Then, 
aij corresponds to the ratio of the importance of x; to that of xj. It is usual 
to use the 1-9 scale (see Table 8.1) to express importance. All comparisons 
aij define a square matrix, where aj; = 1/a;j; because the comparison satisfies 
reciprocity. 


Example 8.1. Let us consider the problem of selecting the most suitable stu- 
dent for a particular task in school. Selection will be based on his or her marks 
in the five subjects considered in Example 5.46. To rate the weight of each 
subject in the selection, we will use the AHP. 

Therefore, we need first to compare each pair of subjects using the 1-9 
scale. To do so, we define X such that, according to the requirements, it 
corresponds to the set {M L, P, M, L, G}, where ML stands for Mathematical 
Logic, P for Physics, M for Mathematics, L for Literature, and G for Greek. 
Second, we consider the pairwise comparison of the subjects. According to the 
particular task to be assigned to the student, we establish that 


e ML is considered three times more relevant than P and two times more 
relevant than M. In relation to Humanities, M L is considered seven times 
more important than L and nine times more important than G. 

e M is the second most important subject, and its importance is two times 
that of P and three times that of L and G. 
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G 
9 
3 
3 
2 
1 


2 


Table 8.2. Pairwise comparison matrix for (M L, P, M, L, G} 


'The pairwise comparisons, as well as all the other required comparisons, are 
given in Table 8.2. 


When the values of aj; are in a ratio scale, they should satisfy aj; = 
Qijajķ. A matrix satisfying this equality is said to be consistent. In a consistent 
matrix, all rows are equal except for a multiplicative factor. That is, for two 
rows r and s, we have that as; = arjk for a certain value k. Specifically, 
Qs; = dj /a15. For such consistent matrices, weights can be obtained by taking 
any row and normalizing it. That is, when N sources are under consideration, 
we define the weights as w; = asj/ Sv), asj, for any row s. It is simple to 
prove that the result does not depend on the row selected. 

Nevertheless, AHP permits us to obtain weighting vectors even in the 
case of inconsistent matrices. This is common in real practice. The weights 
are obtained from the principal eigenvector once it is normalized. We will 
illustrate the process with the matrix defined in the previous example. Note 
that the matrix is inconsistent. 


Example 8.2. Let us consider the pairwise comparison matrix in Example 8.1. 
The weighting vector associated with this matrix (Table 8.2) is obtained as 
follows. 


1. Obtain the principal eigenvector of the matrix. This vector corresponds 
to 
(8.8920 2.7254 4.2718 1.4899 1). 


2. Normalize the vector. That is, divide each term by (8.8920 + 2.7254 + 
4.2718 + 1.4899 + 1): 
(0.4838104 0.148288 0.23242705 0.0810649 0.054409627). 


Therefore, the weights are p(ML) = 0.4838104, p(P) = 0.148288, p(M) = 
0.23242705, p(L) = 0.0810649, and p(G) = 0.054409627. Note that the weights 
roughly satisfy the constraints specified in the matrix. 


AHP offers a measure of consistency to evaluate the quality of the matrix. 
This measure is based on the principal eigenvalue. This is so because it is 


8.2 OWA Weights from Orness 223 


Njli2 3 4 5 6 7 8 9 10 
RI|0 0 0.58 0.90 1.12 1.24 1.32 1.41 1.45 1.49 


Table 8.3. Random consistency indices (RI) for different dimensions N 


known that for a positive reciprocal matrix of dimension N, its principal 
eigenvalue is N, and that the value differs from N for nonreciprocal matrices. 

When the principal eigenvalue of a matrix is Amax, its consistency index 
is defined as 


CI = Amax -N 
N-1 
Additionally, there is a consistency ratio that compares the consistency 
index with an average of consistency indices computed from random matrices 
of different dimensions. Formally, the index is defined by 


CR = CIJRI, 


where the RI is the Random Consistency Index, which depends on the di- 
mension of the matrix N. Table 8.3 gives such indices RI for different values 
of N. 

The CR is a normalized value. For values of C'R larger than 0.10, revising 
the original matrix is recommended. 


Example 8.3. The consistency index for the matrix in Example 8.1 is com- 
puted from the principal eigenvalue (that is Amaz = 5.0654), as follows: 


.0654 — 
Cr= Ded 0.01635003. 
5-1 
The consistency ratio is 
CR= — = 0.014598242. 


8.2 OWA Weights from Orness 


In Section 7.5, we described some measures and indices for aggregation op- 
erators. Orness or degree of disjunction is one such measure. This measure, 
defined as a similarity to the maximum operator, can be understood as com- 
pensation or optimism. 

One approach for defining the parameters of an operator is to fix some in- 
dex or measure, and then to select a parameter satisfying it. The orness, due to 
its simple interpretation, is specially adequate for this purpose. We illustrate 


224 8 Selection of the Model 


the process by considering how to define the parameter for the OWA operator 
when the weights are determined from a fuzzy quantifier. Nevertheless, the 
same approach can be applied for other operators as well. 

In the particular case of OWA with a fuzzy quantifier Q, we have that the 
orness of the OWA operator corresponds to (Equation 7.25) 


orness(Q) =| Q(x) dz. (8.1) 


Then, the determination of the parameter corresponds to finding the quan- 
tifier Q such that orness(Q) = ó for a degree of orness equal to ô (supplied 
by the expert). The following example illustrates this fact. 


Example 8.4. The problem consists of determining a quantifier of the Yager 
family (Qalx) = x?) for the OWA operator when an expert supplies a degree 
of orness or compensation equal to 0.2. That is, 6 = 0.2. 

According to Proposition 7.28, the orness of Qa is given by 


1 
orness(Qa) = aE 


Therefore, the problem is to find a such that 





So, a = (1 — 6)/6 = (1 — 0.2)/0.2 = 4. 


Proposition 7.29, which establishes that the orness of Sugeno A-quantifiers 
and of Yager a-quantifiers are decreasing functions, is useful when defining 
the parameters of the OWA operator, because it implies that, for a single 
degree of orness 6, there is a single quantifier. 

However, in general, the degree of disjunction does not determine the pa- 
rameters uniquely. For example, if we consider OWA operators with weighting 
vectors, there exist several weighting vectors with orness equal to 0.5. In this 
case, solutions can be further constrained by adding additional requirements. 
A usual constraint is to require maximum dispersion among the weights. Other 
constraints might be also of interest. For example, selecting weights so that 
the robustness of the operator is maximized. 


8.2.1 Orness and Dispersion 


The rationale of this approach is to assume that, when several weighting vec- 
tors are equally good, it is better to distribute the weights as much as posible 
among the sources instead of accumulating them in a single source. For exam- 
ple, in the case of the OWA operator the two weighting vectors (0, 1/2,1/2,0) 
and (1/4, 1/4, 1/4, 1/4) have the same orness (equal to 0.5). We prefer the sec- 
ond vector, as all sources contribute to the output. As seen in Section 7.3, the 
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dispersion of a weighting vector can be measured using entropy. So, the con- 
straint of maximum dispersion corresponds to maximum entropy. The same 
can be applied to operators other than the OWA operator. In the case where 
fuzzy measures are used, the fuzzy entropy for fuzzy measures defined in Sec- 
tion 7.3.1 can be used. 

With regard to a degree of disjunction 6 and maximum dispersion, the 
problem of weight determination is formulated as follows: 


Mazimize dispersion 
Subject to 
Ô = orness 


w is a weighting vector 


For the OWA operator, it is formulated as follows: 


Mand N 
Minimize — };—; wln w; 


Subject to 
N ] 
ô = NH Xi (N — iwi 
xus wal 


The following result permits us to compute the weights in the case of the 
OWA operator. 


Proposition 8.5. Given an OWA operator of dimension N > 2, and an or- 
ness value equal to 6 € [0,1], the mathematical programming problem given 
above has a unique solution given by the weighting vector w with 





Din 
for all j — 1,..., N, where t is the only real positive zero of the equation 


az^ 1 (ac 1)z" ? +- (ac N -2)z c (ac N - 1) - 0, 





with a = —ó(N — 1). 


In the case of N = 2, the problem formulated above corresponds to w; = ô 
and w2 = 1—6. An OWA operator with weights of this form is called Maximum 
Entropy OWA (ME-OWA). 
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at as... al |b! 
a2 a2 ... ax |? 
aM aM ... aM 5M 


'Table 8.4. Data examples 


8.3 Extracting Parameters from Examples: Expected 
Outcome 


Now, we consider parameter determination when there is a set of examples for 
fixing the model and the examples are defined by (input, output) pairs. This 
problem can either be expressed as an optimization problem or as a learning 
problem. 

We will present in this section a mathematical formulation of the problem. 
'Then, in the following sections, we will describe some particular methods that 
can be used for some particular aggregation operators. 


Formulating the optimization problem 


As stated, we consider examples defined by (input, output) pairs. Therefore, 
examples follow the structure described in Table 8.4. That is, there are M 
different examples, each consisting of the values supplied by N information 
sources and the correct outcome that we are intended to approximate from the 


values. Therefore, each example consists of N +1 values, with (ala? ...a^;|b/) 


denoting the values for the jth example. Here, a? is the value supplied by 


the ith information source (say, z;), and bÍ is the ideal outcome for the same 
example. A denotes the matrix A = {a7}, and b is the vector such that 
b! = (bl...5 V). We use b’ to denote the transpose of the vector b. Again, we 
use X = (zi,..., vw) to denote the information sources, and the function f? 
to denote that a? = ff (x;) is the value supplied by x; for the jth example. 

Given a set of examples, and assuming that the aggregation function C is 
known, the goal is to determine the parameters of C given A. When C is the 
Weighted Mean (WM), problem is to find a weighting vector p for the WM 
so that the difference between b and the estimated value for the jth example 
is minimum. Similarly, if C is the Choquet integral, then the goal is to find a 
fuzzy measure u that also minimizes such a difference. 

Naturally, to establish this problem properly we need a way to measure 
the difference between the estimated value and the correct outcome. In the 
case of the jth example, this corresponds to the difference between b’ and 
C(f? (23), . .., f (zN)). One alternative (and the one most used) is to take the 
squared difference (i.e., distance(z, y) = (x — y)?). In this case, the best model 
is the one that minimizes 
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M 
Dc(parameters(C =X (Cl GP (231), ..., P (xy)) - y. (8.2) 
j=l 


We will simplify this notation, denoting the parameters of C by P, and 
then using Cp to express that the aggregation operator depends on P. Thus, 
instead of the last expression, we will use 


M 
De(P) = > (Cp? (1)... f (zy)) - BY. (8.3) 
j=l 
Typically, as C is assumed to be known and fixed, the goal is to obtain the 
minimum Dc(P) over the possible parameters P. The minimization of this 
equation results in a least sum of squares (see Section 2.2.5). Nevertheless, 
this is a constrained least square problem, because parameters have to satisfy 
particular constraints that depend on their nature. For example, weights in 
weighting vectors have to add to 1 and be positive. So, in general, the problem 
to be solved takes the following form: 


Minimize Dc(P) = 355 (Cr(aj,..., ah) — 9)? 


8.4 
Subject to logical constraints on P v4 


When the aggregation operator is the weighted mean with a weighting 
vector p = (pi,..., pu), we have `; pj = 1 and pj > 0, and, thus, the problem 
corresponds to 


Minimize Dwm (Pp = (pi, ...,pv)) = 3254 (V Mp(a], ..., ay) — 6)? 
Subject to 


N 
Mie Pi = 1 


pi 20 
Let us consider an example of weight determination for the weighted mean. 


Example 8.6. Let us consider the problem of giving a global score to students 
in a school in terms of their marks in the five subjects considered in Ex- 
ample 5.46; that is, the marks in Mathematical Logic (M L), Physics (P), 
Mathematics (M), Literature (L), and Greek (G). 

Then, let us consider a set of 10 students {s1,..., S10}, for who the marks 
in all five subjects are known, and let us consider a subjective overall rate. 
Such marks and overall rates are given in Table 8.5. 

Now, if we use the weighted mean as a model for our subjective rating, 
then we can use the problem formalized in Equation 8.5 for determining the 
weights. In this case, with p = (pur,ppP.PM;PL.Pc), and using the data in 
'Table 8.5, the previous problem is rewritten as 
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|Student|[ M L P M L G |Subjective evaluation 


0.9 0.8 0.1 0.1 
0.6 0.9 0.2 0.3 
0.7 0.7 0.2 0.6 
0.9 0.9 0.4 0.4 
0.6 0.3 0.9 0.9 


0.4 0.2 0.8 0.1 
0.2 0.4 0.1 0.2 
0.3 0.3 0.8 0.3 
0.2 0.1 0.20.1 
0.2 0.2 0.5 0.1 





Table 8.5. Marks given to ten students, and their subject evaluation 


Minimize Dwy(p) = Ya ((Puradr, + ppay + puo, + pray, + paat) — bi)? 
Subject to 
PML+pp+pmM+pLt+paG=1 
pmL 290, pp20, pw 290, pr20, pa zd 
(8.6) 
The optimal solution of this quadratic problem subject to linear constraints 
gives the following weights: 


DML = 0.4244, pp = 0.4108, pm = 0.0000, pz, = 0.1249 and pg = 0.0399. 


So, this model implies that there are two main marks that are considered 
relevant in our subjective evaluation: the one for M L and the one for P. 


Dealing with multiple solutions 


Before going into the details on the computation of a solution for the opti- 
mization problem, it is important to note that the minimization problem does 
not always lead to a single solution. Instead, several solutions can exist with 
the same distance Dc( P). 

'This situation is similar to the one we found in the case of parameter 
determination from a given orness. Again, we can consider an additional mea- 
sure such as dispersion. So, given several parameters P with the same Dc(P), 
we select the one that maximizes dispersion. In this case, the problem to be 
solved is formalized as follows. 


(i) Find a solution P of the minimization problem in Expression 8.4, and then 
define A as the error for this parameter P. That is, A — D(P) 
(ii) Solve the following problem: 


Maximize dispersion(P) 
Subject to 

logical constraints on P 
and 

A = Dc(P), 
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where dispersion is measured in terms of entropy. 


8.3.1 Weighted Mean 


'The simplest problem for learning weights is when the aggregation operator 
corresponds to the weighted mean. In this case, using the Euclidean norm 
llz|| = vac, we can express Dwm(p) as Dwm(p) = ||Ap — b||? = (Ap — 
b)'(Ap — b). Due to the fact that this expression gets its minimum when we 
have the minimum for (1/2)p'Q'p + r'p, where Q = A'A and r = A'b, we can 
reformulate the problem given above in Expression 8.5 as follows: 


Minimize (1/2)p'Qp + r'p 
Subject to 


From now on, we consider this problem as the one to minimize. Neverthe- 
less, for the sake of clarity, we use Dw m (p) to denote the difference between 
the model using p and the examples, instead of using the new minimization 
function (1/2)p'Qp + r'p. 

The new formulation of the problem shows clearly that this is a typical 
optimization problem: a quadratic program with linear inequality constraints. 
There exist several algorithms to solve this class of problems. We consider here 
two different approaches. First, a method that is a variation of the gradient 
descent adapted for the weighted mean, and, second, a method that is generic 
and can be applied to any kind of optimization problem with a quadratic 
function and linear inequality constraints. The latter method can be applied 
to the case of the weighted mean. 

Example 8.6 given above is an example of the optimization problem. 


Using gradient descent 


'The key element of this approach is to reformulate the problem so that no 
equality or inequality constraint is needed. So, we drop constraints p; > 0 
and `; pi = 1. This is achieved considering an unconstrained vector A = 
(M1... Aw) from which the constrained weights p; are extracted as follows: 


Ai ÀN 


e e 
peel m Sr (8.7) 
wm eò 2 eN 
Note that, in this way, any vector A € R leads to a valid weighting 
vector. That is, for all A, we have that all pa constructed using Equation 8.7 
satisfy p; 2 0 and ubi = 1. Therefore, the problem of learning weights for 
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the WM operator is equivalent to the following unconstrained minimization 
problem: 


M 
Minimize Dwm(4) = V (W Mp, (a],..., aj) — 6)”. (8.8) 
j=l 
That is, 
M N e^ ] 
Minimize Dwm(4) = 3 (Y w a - v). (8.9) 
j=l i=l 25 e 


To apply gradient descent to this problem, an iterative process is applied 
where one of the examples is considered at each step, and the parameters A; 
are updated according to the error for the example. So, given the jth example 

(a ab ... af b), 
we define the error ef as 
N e : $ 
eee ee he 
N ee 
iw j=l e 

Then, given the parameters A; at time t (such parameters are expressed 

by A;(t)), the new parameters A;(t + 1) are defined as follows: 


0e? 





Ai(t +1) := A(t) - 8 


for i = 1,..., N, where £ is a learning rate (a small value 0 € 8 < 1). This 
expression is equivalent to 
e (0 a) ke 
Ailt +1) = Alt) - b&n (a — bb — bi), (8.10) 


ja eò (t) 


where 6? is the estimate of b at time t. That is, 5 = WM qy(ai, catia): 
Algorithm 1 describes this algorithm. 


Using active set methods 


Another approach for solving the optimization problem is to use an algorithm 
for quadratic programming. We consider an algorithm based on active set 
methods. These methods exploit the fact that the computation of the solu- 
tion of quadratic problems with linear equality constraints is simple. Active set 
methods are iterative methods in which, at each step, inequality constraints 
are partitioned into two groups, those that are to be treated as active (consid- 
ered as equality constraints) and those that are treated as inactive (essentially 
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Algorithm 1 Gradient descent 
Algorithm GradientDescent (A,6: Examples) returns weighting vector is 
begin 
int t:—0; 
define A(t) := (1...1); 
while no convergence do 
Select example j 
b = WM (aj, sees a) 
Compute A;(¢ + 1) for i = 1,..., N using Equation 8.10 
t:=t+1; 
end while 
Use Equation 8.7 to find weights p from A. 
return p; 
end 


ignored). Once the partition is known, the algorithm proceeds by moving on 
the surface defined by the working set of constraints (the set of active con- 
straints) to an improved point. In this movement, some constraints are added 
to the working set and others are removed. Then, the algorithm computes a 
new movement on the surface. This process is repeated until the minimum is 
reached. 

In our case, inequality constraints are the ones that restrict the weights 
to be positive. Due to this, when the constraint corresponding to the weight 
p; is active, the weight is forced to be zero. In contrast, when a constraint is 
not active, the value of the corresponding weight is not restricted. Thus, at a 
certain step, the working set is defined by the initial equality constraint (all 
weights add to 1) and the active ones. 

Assuming that, at the kth step, p” is a nonoptimal solution found in the 
previous step, the movement on the surface described above consists of modi- 
fying p* so that a better approximation (more minimal than the previous one) 
is obtained. That is, at time k, a vector d^ is computed that corresponds to 
the step to perform from the last solution p*. Therefore, we have that the new 
solution p**! is computed as p**! = p* + d*. As p*-! is a weighting vector 
(i.e., pr" = 1), and as p^ is also, the vector d" satisfies ae d =0. 

Hence, the vector d! is obtained as the solution of the following optimiza- 
tion problem 


Minimize (1/2)(d*)'Qd* + (g^)'d* 
Subject to cd" — 0 for all i € Ws, 
where g^ = r+Qp* and c; is a vector with the coefficients of the ith equality 


constraint in the working set at the kth step (Wx). 
'This problem can be solved with the following system of linear equations: 
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Qd + CA = —g* 
c'd* — 0, 


where A are the Lagrange multipliers, and C is the matrix formed with all the 
coefficients of the active constraints. 

To complete the method, we need to specify the procedures for adding a 
constraint to the working set and for removing a constraint from it. Both are 
considered in the algorithm that describes the whole method. The algorithm 
requires an initial weighting vector p? that satisfies all the constraints (feasible 
solution). This is easy to find, as p? — (1/N ...1/N), N being the number of 
weights, is a suitable weighting vector. The algorithm is given as Algorithm 2. 
The input data to this algorithm are the matrix Q, the vector r, and the 
initial p?. The result is the optimal weighting vector and the corresponding 
Dw (p). 


One or multiple solutions for the weighted mean 


Now, we study the optimization problem to establish for which conditions 
there is a single solution. In the case of multiple solutions, we can consider 
the optimization of another property, such as dispersion, for selecting one 
weighting vector from among those with the same optimal Dw (p). 

Multiple solutions can be caused by redundant information. Note that 
redundant information might imply that the data in some information sources 
is deduced from the data in the other sources. In this case, the matrix Q 
might be singular, and, in some cases, the singularity causes the system to 
have several solutions. We consider these issues in some detail below. 

First, we should consider the case of independent sources. If they are in- 
dependent, then there is a single solution of the minimization problem. This 
is established below. 


Proposition 8.7. Let A be a matriz of examples of dimension M x N (number 
of examples x mumber of sources); then, if the columns of A are linearly 
idependent, the minimization problem in Equation 8.3.1 has a single solution. 


So, there is a single vector p that minimizes the distance Dw (p). 
In the case of dependencies, we have three different situations. The first 
one is as follows. 


Proposition 8.8. If there is a column ay in A that is a linear combination 
of the other columns a; in such a way that 


Qk = 3 aia, 
izk 
with Xy a; = 1 and oj È 0, then, if the vector p is an optimal solution 
of the minimization of Dwm(p) with py # 0, there is at least one other 
optimal solution p* such that p; = 0. Naturally, as both are optimal solutions, 


D(p*) = D(p). 
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Algorithm 2 Learning weights for the weighted mean 
Algorithm LearningWeightsWeightedMean (p: weighting vector; 
Q: Quadratic matrix; r: vector) returns (weighting vector, optimal distance) is 
begin 
k:=0; p" := p; 
Wp := the equality constraint corresponding to have a = (1...1); 
boolean exit := false; 
while not exit do 
boolean check-lagrange := true; 
gh = r + Qp"; 
Compute d* and A (lagrange multipliers) as a solution of: 
Minimize (1/2) (d*)' Qd* + (g^)'a* subject to c;d" = 0 for all i € We 
if d^ 40 then 
a® := min{1, min{(b; — c;p")/(c;d*)|e;d* > 01; 
pit! := pF + akat; 
if af <1 then 
/* add restriction (the equality constraint) corresponding to p*a^ = 0 
* 
/ 
Wii = Wy4 the constraint of the index of a*; 
check-lagrange :— false; 
end if 
end if 
if check-lagrange then 
Aq :— min (Aii € IY We}; 
if A, > 0 then 
exit:— true; 
else 
/* drop restriction (equality constraint) corresponding to the qth 
weight * / 
Wk+1 :— Wk - the qth equality constraint 
end if 
end if 
k:=k+1; 
end while 
return < p*, D(p*) >; 
end 


So, we can eliminate the kth column, as this does not change the optimal 
value of Dwm (p). 

The second case of dependency is where, for any aj that can be written 
as ak = Disk a,a;, we have some a; < 0. In this case, the following result is 
known. 


Proposition 8.9. Let us consider the case where, for all ay that can be written 
as Qj = 29 aja; with isk a; = 1, there exists at least one a; such that 
a; < 0. In this case, one of the weights will be zero. 


Nevertheless, it is not known which of the weights will be zero. 
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Algorithm 3 Optimal solution for the weighted mean 
Algorithm optimalSolutionWMn (p: weighting vector; A: Data; 
b: expectedResults) returns weighting vector is 
begin 
if Q has not linear dependent columns then 
Compute Q = A'A and r = A'b 
« p, D(p) > = LearningWeightsWeightedMean (p,Q,r); 
return p; 
else 
if exists a, such that a, = paren aja; with ick a; = 1 and a; > 0 for all 
i Æ k then 
Remove column ak from matrix A 
Compute Q’ and r/ accordingly 
< p, D(p) > = LearningWeightsWeightedMean (p,Q’,r’); 
build p' from p and with p; — 0 
return p; 
else 
fori=1in1---N do 
Remove column a; from matrix A 
Compute Q’ and r’ accordingly 
« p(i), D(p(i)) > = LearningWeightsWeightedMean (p,Q’,r’); 
build p'(;) from p(i) and with p'(i); = 0 
end for 
return the weighting vector p(i) with minimal D(p(i)); 
end if 
end if 
end 


The last case occurs when there is dependency, but $a; 4 1. 


Proposition 8.10. When, for all linearly dependent columns ay in A (i.e., 
ak = idk aiai), we have $504 # 1, the optimization problem has a single 
solution. 


Algorithm 3 returns the best solution for any kind of problem, either with 
dependencies or without them. The inputs of the algorithm are the data to 
be used to determine the parameters, that is, the matrix A and the vector b. 
Additionally, we should supply a first initial weighting vector p? to start the 
iteration process. The outcome of the algorithm is the optimal p. 

In Section 8.3 we considered the case with multiple solutions, arguing 
that an additional property can be optimized. We will illustrate this requir- 
ing maximal dispersion. Let A be the error for the optimal solution; then, 
the minimization problem for the optimal weighting vector with respect to 
dispersion is as follows: 
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Minimize — E(p) = 3, pilogp; 

Subject to 
Dwwu(p)-A (8.11) 
a pi=l 
pi = 0 


We now consider this problem restricted to the case of a single dependent 
source z; such that aj = isk Qia; and s a; = 1. The set of solutions is 
characterized in the following proposition. 


Proposition 8.11. Let A be such that there is an ay = idk aja; with 
ick a; =1, and its removal leads to a matrix A’ with independent columns. 
Then, all optimal solutions of the optimization problem for the weighted mean 
(Equation 8.3.1) are of the form 


p'-—p- TKQ, 


with a being the N dimensional vector defined from the values a; anda, = —1. 
That is, 

a= (01, see yAk-1, —1, Qk+1, e ,an) 
and 


Naturally, the solution of the problem stated in Equation 8.11 is of this 
form. Therefore, as the entropy is a convex function, finding the optimal so- 
lution with maximum dispersion corresponds to finding the maximum of a 
one-variable convex function. The variable of this function is Tk. 


8.3.2 OWA Operators 


The algorithms studied in the previous section for the weighted mean can be 
applied without difficulty to the case of the OWA operator. This is due to the 
fact that, from a computational point of view, the only difference between the 
weighted mean and the OWA is the ordering step in the OWA. 

Accordingly, the application of the previous algorithms to the OWA only 
requires us to reorder the data for each example in decreasing order. Then, 
the first weight will correspond to the largest value, the second weight to the 
second largest value, and so on. 

For illustration, we reconsider the data in Example 8.6, and learn a model 
based on the OWA operator. 


Example 8.12. Let us consider the data in Table 8.5, and assume that our 
subjective model is based on the OWA operator. This data is reproduced 
in Table 8.6. Then, first we order each record in decreasing order to obtain 
Table 8.7. Naturally, the subjective evaluation does not change. Next, we apply 
the algorithm based on active set methods described in Section 8.3.1, and get 
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|Student|[ M L P M L G |Subjective evaluation 


0.9 0.8 0.1 0.1 
0.6 0.9 0.2 0.3 
0.7 0.7 0.2 0.6 
0.9 0.9 0.4 0.4 


0.6 0.3 0.9 0.9 


0.4 0.2 0.8 0.1 
0.2 0.4 0.1 0.2 
0.3 0.3 0.8 0.3 
0.2 0.1 0.2 0.1 
0.2 0.2 0.5 0.1 





'Table 8.6. Marks given to ten students, and their subject evaluation 


|Student|as (1) Q5(2) Qs(3) Go(4) Ao(5) [Subjective evaluation 





Table 8.7. Marks given to ten students, reordered for learning OWA weights 


the following weights: wj = 0.1245, we = 0.6385, ws = 0.0531, w4 = 0.0000, 
ws — 0.1839. Therefore, when using the OWA model, we have that the most 
relevant mark is the second one. This mark accounts for 63% of the final mark. 
The first and the last marks have a similar importance (0.1245 and 0.1839), 
and the fourth has no importance. 


8.3.3 The WOWA Operator 


In the case of the WOWA operator, the optimization problem is not quadratic. 
Therefore, the active set methods described in Section 8.3.1 cannot be applied. 
Nevertheless, the gradient descent method (described in Section 8.3.1) can be 
applied, as the WOWA operator uses two weighting vectors as parameters. 
As the WOWA operator generalizes both weighted mean and OWA, we 
can bootstrap the gradient descent with the best solution obtained with either 
the OWA or the weighted mean. So, we apply a mixed approach combining 
the two methods described above. This is detailed below. We will use p to 
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denote the WOWA weights used in the weighted mean, and w to denote the 
weights used in the OWA operator. 
The method for the WOWA is as follows. 


1. Solve the optimization problem for the weighted mean and determine p. 
Let Dwm(p) be the optimal distance achieved. Solve this problem us- 
ing an optimal solver (such as the active set methods described in Sec- 
tion 8.3.1). 

2. Solve the optimization problem for the OWA operator. Let w be its solu- 
tion and let Dow A(w) be the optimal distance achieved. 

3. Compute Dwowa for the pairs (p, (1/N, ..., 1/N)), (1/N, ..., 1/N), w) 
and (p, w). Select the pair with minimal Dwow A. 

4. Define A? from p and A" from w so that 


A > AP AN > AP 
p=(e/S > e^ eng Ses) 
j=l j=l 


N N 
w — (e / re E" gu eM] 
j= j= 


Unless there is one p; or w; equal to zero, such vectors can be obtained 
defining A? = log p; and A?" = log w;. Zero weights can only be approxi- 
mated with a large enough negative value for A. 

5. Apply gradient descent for the WOWA operator. This corresponds to the 
algorithm in Section 8.3.1, where ef has been defined in terms of the 
WOWA operator. The computation of 


Oe) 
At t 1) = A(t) - B 3; 
uses a numerical approximation of the derivative. The gradient descent is 
applied until some convergence criterion is met. 
6. Once the A parameters are known, obtain the corresponding weights p 
and w. 


This learning approach is based on the definition of WOWA with two weight- 
ing vectors. Alternatively, we can consider the use of a parametric fuzzy quan- 
tifier. Similar approaches can be applied in this case. 


8.3.4 Choquet Integral 


Now, we consider the problem of parameter determination in the case of the 
Choquet integral. As we did with previous methods, we consider again the 
minimization of the least sum of squares. 
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Let k be an integer; then, 64,5%,_,... ôF is the dyadic representation of k when 
k=20 oy +2 9A ae OO. 


For example, 101 is the dyadic representation of 5 and 11111 is the dyadic represen- 
tation of 31. 


Fig. 8.1. Dyadic representation of an integer 


We start by considering the most general case of a Choquet integral with an 
unconstrained fuzzy measure u. The problem in Equation 8.4 can be rewritten 
as 


Minimize Dc(u) = 355-4 (CI,(aj,..., ay) — 0)? 

Subject to 
u(0) — 0 (8.12) 
W(X) —1 
u(A) € p(B) when AC B 

'The problem can be solved as a quadratic problem with linear constraints. 
'Therefore, the active set methods described in Section 8.3.1 are also suitable 
here. We give below the details on how to formulate the problem. 

'The basic idea is to determine the Móbius transform of the measure instead 
of the measure itself. Then, the optimization function should be rewritten 
using the Móbius transform, and appropriate constraints should be added to 
the problem. 

We consider some notation. As usual, let X = (z1,...,zN) and aj = 
f? (a;). Now, instead of using (A) to denote the measures of subsets of X we 
will use ug, with k € (0,...,2N — 1). To do so, we need a mapping between pk 
and the subsets A C X. This will be achieved using the dyadic representation 
of integers (see Figure 8.1). In particular, let 54.6%, ,...ó0f be the dyadic 
representation of k; then, uj denotes the measure of the following set: 


pk = u((zi € XP =1 for L2 1,..., N]). 
Additionally, we use up(a) to denote the measure of the set ju(A). 


Example 8.13. Let X = (21,22,23, 24, 25]; then, u([21,23,24]) is repre- 
sented by u13, because the dyadic representation of 13 is 01101. In particular, 


0226610190]? = 01101. 


Therefore, 5/5 = 1, 645 = 1 and 6}? = 1, but à? = 0 and ô} = 0. 
So, we have wig = u([z4, 3, z1]). Similarly, yı = u((21]), we = u((x2] 
Us = pg(ixi,22]), p4 = nu(ixs)), uo. = u((zi,23]), pe = u22, 03} 
pz = w({ a1, €2,03}), «+, bana = W(X). 


) 
) 
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Using this notation, we have that a fuzzy measure can be represented 
by the vector (uo, p,...,H2N 1). As uo should not be learned, because it 
is always zero, we only consider the vector (,...,J12N .,), denoted by pT. 
In a similar way, we will consider the vector mt = (m34,...,mav. 4) that 
corresponds to the Mobius transform of u. Then, for each example, we can 
rewrite its Choquet integral as the product of a vector a* by a vector m* . Such 
a product is the result of rewritting the expression of the Choquet integral. 

First, let us recall the definition of the Choquet integral (see Equation 6.1): 


CI, (ai, +1250 zs Qs(i) — Qs(i— (i1) &CAs(), (8.13) 
i—l 
where a,(; = f(v,(;j) is defined from the permutation s so that 0 < f(z,(1)) < 
- € f(zs(w)) € 1, where Asq) = (2.4),...,2.(w)) and f(£s(0)) = 0. 
Using the Móbius transform, this expression can rewritten for each exam- 
ple j as follows: 


Cl, (aj, sss [55 = ei (az) — bees acai m(A))) = 
E ps 12; AcAÍ, AM a) Q su Pa) m(A), 


where a? sq) denotes the ith lowest value in (a, ... an). Note that the permu- 
tation s(i) depends on the jth example. So, ecd we have a permutation 
s) for each example j = 1,..., M. Similarly, the sets Asi) also depend on the 
example. That is why we use A) (i) to denote them. 

Now, using a proper ordering of the terms, the expression above can be 
rewritten as follows: 


oN 4 


J ne +J = J 
CT ads usu pcm X aj mp cam". 


So, the minimization problem established in Equation 8.12 can be rewritten 
as 


Minimize Dc(P) = 5% (atim*+t — b)? 


j 
Subject to (8.14) 


2N 1 EX 
k=1 Mk= 


BCR M(B’) — aca MEH) >0foral ACB 


Note that the condition m(()) = 0 is not needed, as mo is not included in 
the model, and that the constraints p(X) = 1 and u(A) € u(B) when AC B 
are replaced by the appropriate constraints on m. 

'This optimization problem has the same structure as the one of the 
weighted mean. It is a quadratic optimization problem with linear constraints. 
Therefore, it can be solved using the same methods used for the weighted 
mean. We illustrate below its application with one example. 
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Example 8.14. Let us reconsider Example 8.6; but, in this case, we consider 
a Choquet integral as the appropriate aggregation operator. Then, using the 
model in Equation 8.14, we have that the fuzzy measure learned, as well as its 
corresponding Móbius transform, is given in Table 8.8. The Móbius transform 
permits us to distinguish the most relevant sets; that is, the sets with no null 
Mobius value. They are the following ones 


m((L]) — 0.106946016 
m({L, G}) = 0.058218250823 
m({P}) = 0.40697540673 
m({ML}) = 0.4278602983 


It can be observed that these most revelant subjects are related with the 
subjects distinguished in Example 8.6, where the weighted mean was used for 
aggregation. In that case we had pmr = 0.4244, pp = 0.4108, py = 0.0000, 
pL = 0.1249, and pa = 0.0399. Thus, again, ML and P have larger weights, 
followed by L. While G had a non-null weight in the case of the weighted 
mean, we observe that in the case of the Choquet integral G is only relevant 
when used in conjunction with L. 


Learning constrained fuzzy measures 


'The model based on the Choquet integral can be easily extended for learning 
two types of fuzzy measures: k-order additive fuzzy measures and belief func- 
tions. We detail how the optimization problem above should be modified so 
that the fuzzy measure learned is of this type. 


k-order additive fuzzy measures: A fuzzy measure is k-order additive when 
the Mobius transform is zero for all subsets of X with cardinality larger 
than k. In order to obtain a fuzzy measure that is of this class for a given 
k, the optimization problem should constrain all sets A where |A| > k 
to have m(A) = 0. This can be done by adding such constraints to the 
model, or, alternatively, removing terms my(A4j with |A| > k from the set 
of variables in the optimization problem. 

Belief functions: A fuzzy measure is a belief function when the Mobius trans- 
form is always positive. ''herefore, in order to obtain such fuzzy measures 
from the model, we need to add mj > 0, for all k = 1,...,2N — 1 to the 
optimization problem. 


In addition to these two types of fuzzy measures, two other types are 
worth mentioning: additive and symmetric. Note that the Choquet integral 
with respect to these measures are, respectively, a weighted mean and an 
OWA operator. Therefore, it is simpler to solve the optimization problem 
established in Sections 8.3.1 and 8.3.2. Optimal solutions are also obtained in 
this case. If the general algorithm described here is used, then the constraints 
for additive fuzzy measures are, of course, m(A) = 0 for all |A| > 1. In the 
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{00000} 
{00001} 
{00010} 
{00011} 
{00100} 
{00101} 
{00110} 
{00111} 
{01000} 
{01001} 
{01010} 
{01011} 
{01100} 
{01101} 
{01110} 
{01111} 
{10000} 
{10001} 
{10010} 
{10011} 
{10100} 
{10101} 
{10110} 
{10111} 

1000} 

1001} 

1010} 

1011} 

1100} 

1101} 

1110} 

1111} 


X = {ML, P, M, L,G} 





0.0 
3.4022290872E-10 
0.106946016 
0.1651642671632229 
8.3601024454E-9 
8.83795255408E-9 
0.1069460245331454 
0.1651642760308463 
0.40697540673 
0.4069754072782796 


0.5139214228473248 


0.5721396745940198 
0.4069754253555859 
0.4069754261791199 
0.5139214418189969 
0.5721396942346477 
0.4278602983 
0.4278602992562150 
0.5348063178062644 
0.5930245705338666 
0.4278603068168202 
0.4278603080796658 
0.5348063266691706 
0.593024580097 1049 
0.8348357051643919 
0.8348357066267909 
0.9417817250059003 
0.9999999789905173 
0.8348357240771254 
0.8348357261527856 
0.9417817446108061 
0.9999999999960869 


Möbius transform 
0.0 
3.402229E-10 
0.106946016 
0.058218250823 
8.36010244E-9 
1.3762720999999943E-10 
1.7304299659848255E-10 
1.9685075791642248E-10 
0.40697540673 
2.0805673850432527E-10 
1.1732481652870774E-10 
3.754153654611514E-10 
1.0265483518789864E-8 
1.3762718742427182E-10 
1.7304313537636062E-10 
1.9685064689412002E-10 
0.4278602983 
6.159921461801332E-10 
3.5062643899408386E-9 
9.48387146593177E-10 
1.5671774988845755E-10 
1.6900336685665707E-10 
1.7304302435405816E-10 
1.9685075791642248E-10 
1.3439194201936289E-10 
2.981271896018711E-10 
2.179191271878267E-10 
3.754155875057563E-10 
1.3042977808908063E-10 
1.6900347787895953E-10 
1.7304324639866309E-10 
1.9685031382721263E-10 
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Table 8.8. Fuzzy measure u and its Mobius transform. The first column denotes 
the subsets of X = (z1,...,25) (a 0 in the ith column means that x; is not included, 
while a 1 in the ith column means that x; is included) 


case of symmetric measures, we will add the constraint m(A) = m(B) for all 
subsets of X where |A| = |B]. 

In the case that the fuzzy measure is a distorted probability, the problem 
cannot be solved with a similar approach. Nevertheless the method described 
in Section 8.3.3 is appropriate as the WOWA operator is equivalent to a 
Choquet integral with a distorted probability. 
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|Student|[ M L P M L G |Subjective evaluation 


0.9 0.8 0.1 0.1 
0.6 0.9 0.2 0.3 
0.7 0.7 0.2 0.6 
0.9 0.9 0.4 0.4 
0.6 0.3 0.9 0.9 


0.4 0.2 0.8 0.1 
0.2 0.4 0.1 0.2 
0.3 0.3 0.8 0.3 
0.2 0.1 0.2 0.1 
0.2 0.2 0.5 0.1 





Table 8.9. Marks given to ten students, and their subject evaluation using prefer- 
ences 


8.4 Extracting Parameters from Examples: Preferences 
or Partial Orders 


Another case where learning approaches can be considered is when there are 
examples (e.g., alternatives) and a (partial) order is defined over them accord- 
ing to our preferences. The following example illustrates this situation. 


Example 8.15. Let us consider again the students in Example 8.6, and let us 
assume that our subjective overall rate is replaced by our preference on the 
students. Table 8.9 includes the marks of the students and our subjective 
preference (right column). 

The subjective preference now gives an ordering of the students. The order 
is partial, as there are groups of students that are indistinguishable. This is 
the case of the two 1st students in the class (s4 and ss). 


To formulate this problem we consider the examples and a (partial) order 
relation <. The set S that represents this relation is defined by the pairs (r, t) 
such that s, > s; (student s, is preferred to student s+). Then, given a model 
defined by an aggregation operator C with parameter P, the goal is to find P 
such that, for all (r,t) € S, it follows that 


C p(evaluation-student r) > Cp(evaluation-student t) 
or, following the notation in Section 8.3, where f" denotes example r, 
Cp(f^ (21); ..., f (zN)) > Cof (21)... f(x). 
Naturally, this equation can be rewritten as 
Cp(f"(a1),---, f(a) - Cof (21)... (an) > 0. 


Although we want this equation to hold for all (r,t) € S, data might be 
inconsistent; so, for each (r, t), we consider a variable y(r +) > 0 that we expect 
to be as small as possible so that the following equation holds: 


8.5 Analysis 243 


Cp(f" (x1), ms .» f"(tn)) E Cp(f (a1), xs 2t dira) + Y(r,t) > 0. 


Considering all (r,t) € S, we are interested in the parameter P that minimizes 
the number of violations: 


X U(r,t)- 


(r,t)eS 


Therefore, the problem to minimize is as follows: 


Minimize 2 (nes Y(r,t) 


Subject to 
Cp Cf^ (21); E. f^ (xw)) n Cp(f'(z1), seran f'(en))+ Y(r,t) > 0 
Yr,t) 20 
logical constraints on P 
(8.15) 


In the particular case where the model uses the weighted mean, this prob- 
lem is formulated as 


Minimize tes Y(r,t) 
Subject to 
XXL Pil (ws) — f'(n))8 9 > 0 
(8.16) 
1b = 1 


pi = 0 


Other models have been proposed with similar objectives. One of them 
assumes that the difference between two preferred examples should be larger 
than a certain threshold (defined as constant), and then maximizes the mini- 
mum difference between the two alternatives. 


8.5 Analysis 


We focus the analysis to the case of parameter determination in the case of 
using examples. 

The main two approaches considered (gradient descent and quadratic pro- 
gramming) have advantages and disadvantages. In general, quadratic pro- 
gramming, when applicable, is faster, and enables us to obtain a global opti- 
mum in reasonable time. For this kind of problems the gradient descent is not 
adequate, because of its slow convergence and because the initial weighting 
vector influences the final result. Besides, there is no one-to-one correspon- 
dence between A vectors and weighting vectors. Therefore, it is more efficient 


244 8 Selection of the Model 


to use this approach for the weighted mean and the OWA operator. The same 
is true for the Choquet integral for unconstrained fuzzy measures, as well as 
for k-order additive fuzzy measures or belief functions. 

Nevertheless, in other types of problems, when the function to minimize is 
not quadratic (as is the case with the WOWA operator), active set methods 
are complex and difficult to implement. Other approaches are more suitable. 
The gradient descent and genetic algorithms are two of such approaches. 

An additional problem to be considered when learning parameters from 
examples is that of missing data. When an example has missing data, we can 
either drop the example or adapt the optimization function so that it can be 
applied to the remaining data. Some operators can be easily adapted for this 
purpose. For example, the OWA operators with quantifiers can be applied to 
an arbitrary number of parameters. Nevertheless, such transformations cause 
the problems to be more complex, and, no longer quadratic. 


8.6 Bibliographical Notes 


1. Analytic Hierarchy Process: The Analytic Hierarchy Process was de- 
veloped by Saaty between 1971 and 1975. The classical reference is [344]. 
A short paper describing the main insights of the approach is [343]. [390] 
is a Web page that allows us to compute the AHP for a single matrix. The 
consistency ratio described here is not the only one possible, and other 
definitions exist in the literature. See [15] for a recent analysis. The paper 
includes some historical notes on this issue. 

2. Parameters from orness: The selection of parameters from orness 
or degree of disjunction was first proposed by Dujmovié in [102] with 
respect to root-mean-powers for two inputs. [103] considered additional 
inputs. The method to determine weights for the OWA as a maximization 
of dispersion given a particular orness value was introduced by O'Hagan 
in [301]. This corresponds to the ME-OWA operator (the first use of this 
term seems to be in [303]). O’Hagan solve the optimization problem using 
geometric programming formulation (see [301] and [302]). In [302], he 
gives the weighting vectors for all solutions of dimensions 3, 4, 5, and 6 
when the orness is one of (0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55}. For 
orness equal to 0.5, the optimal solution is p; = 1/N. For orness below 0.5, 
Proposition 7.24 can be used. Proposition 8.5, concerning OWA weights, 
was proved by Carbonell, Mas, and Mayor in [60] (1997). This case was 
independently studied later by Fullér and Majlender in [156]. The same 
authors considered the problem of learning weights for OWA given orness, 
assuring minimal variability (Definition 7.14) for the weights. See [157] for 
details. [438] considered the problem of learning weights for OWA with 
constraints on the weights. 

3. Optimization methods: Algorithms for solving optimization problems 
can be found in [162, 240, 300]. Our first implementation (in Java) for 
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quadratic problems with linear constraints using active set methods fol- 
lowed [240]. There exist (free and commercial) software for solving this 
problem. We are currently using BPMPD by Mészáros [267]. The Kap- 
palab package [171] (“laboratory for capacities”) developed for the lan- 
guage and environment for statistical computing R [327] implements some 
learning methods, as well as some aggregation operators and indices. 

. Parameters from preferences or partial orders: Srinivasan and 
Schoker (1973) [372, 373] were the first to propose linear programming 
for learning the weights of a linear model. Weights are found from a set 
of examples and preferences, as in Example 8.15. They show that the 
quality of the model (goodness of fit) is not influenced by whether the 
linear model is a weighted mean with weights p (that is, $5; p; = 1) or a 
linear model with p such that `; pj = K z 1. Nevertheless, they mainly 
concentrate on a third approach, where goodness of fit (G) and poorness of 
fit (B) have a difference of h (that is, G— B — h). Their model is known as 
LINMAP (LINear programming techniques for Multidimensional Analysis 
of Preferences). Pekelman and Sen (1974) [317] consider in some detail the 
same problem with a weighted mean. They define objective functions to 
minimize the amount of violation (as in the previous works by Srinivasan 
and Schoker), and to minimize the number of violations. 

While in the previous works a single set of preferences $ is considered, 
Horsky and Rao (1984) [192] considered the case of different sets of pref- 
erences. This is modeled, considering a partition of S into (s1,...,5.], 
where (r,t) € S3 and (r',t') € S1 means that the difference between r and 
t is much larger than the one between r’ and t. Formally, 


[C(f^) — C(f*)] > [C(/7) - C(f*)] 
for (r,t) € Sa, and (r',t') € Sy ifa >b 


[432] studies the case of a single set of preferences $ when there is a group 
of experts, each with his or her own opinions. Another related problem 
was studied in [79], where weights are learned in a two-class classification 
problem. Although this approach to learning parameters from preferences 
has been mainly restricted to the weighted mean, Meyer and Roubens 
considered the case for the Choquet integral [268]. 

. Parameters from outcomes. 

a) Weighted mean and OWA operators: The determination of pa- 
rameters for the OWA operators were initially studied by Filev and 
Yager using the gradient descent with the transformation e^ / > j e 
in [140, 140]. This problem was latter studied by Torra using active 
set methods [400] for both OWA and the weighted mean. Results on 
linear dependencies on the data can be found in [406]. This paper also 
considers learning parameters for quasi-arithmetic means. 

b) WOWA operators: An algorithm for learning the parameters of the 
WOWA operator is given in [401] and [408]. This problem is equivalent 
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to the one for solving a Choquet integral with a distorted probability. 
'The consideration of genetic algorithms for learning the parameters 
of the WOWA was considered in [298]. 

The Choquet integral: The earliest approaches for using learn- 
ing parameters for the Choquet integral are due to Mori and Muro- 
fushi [273] and Tanaka and Murofushi [391]. [273] was the first to 
consider quadratic programming for learning fuzzy measures. [199] 
and [200] present a method based on genetic algorithms. Mathemat- 
ical results for this optimization problem are given in [195, 196]. A 
related problem was considered in [428], a regression model based on 
a Choquet integral, where the fuzzy measure as well as a few addi- 
tional parameters are learned using genetic algorithms. 

As stated above, the case of learning WOWA weights (studied 
in [298, 401, 408]) corresponds to determining distorted probabilities. 
In this case, both the weights and the functions should be learned. 
[298, 401, 408] learn the function through the weighting vector w, 
later interpolating the function from w. Identification of Sugeno A- 
measures with genetic algorithms was studied in [72, 222]. 

Learning k-order additive fuzzy measures has been studied for the 

particular case of k = 2 by Marichal and Roubens [253]. They consider 
additional constraints on the interaction between information sources; 
that is, whether interaction (Móbius transform) is positive or negative. 
The Sugeno integral: Learning models for the Sugeno integral have 
not been studied much. [198] seems to be the first publication to use 
gradient descent for learning the measure in a Sugeno integral. [423] 
uses genetic algorithms for learning fuzzy measures (the k-maxitive) 
for a Sugeno integral. 
Other operators: Learning algorithms have been applied to several 
other families of aggregation operators. For example, [38] considers 
uninorms, weighted quasi-arithmetic means, and weighted root-mean- 
powers; [39] studies methods for the Generalized OWA, the Gener- 
alized Choquet integral, and the Geometric OWA; [197] presents an 
algorithm for the twofold integral. 


6. Unsupervised learning methods: Research on fuzzy measures and 


integrals mainly uses measures either defined heuristically or learned us- 
ing supervised approaches. Nevertheless, there is also a trend toward us- 
ing unsupervised learning approaches. Soria-Frisch [370, 371] has devel- 
oped one such approach, applied to the field of computer vision. Kojadi- 
novic [212, 213] has also considered this type of learning method. 

. Reinforcement learning methods: The method proposed by Keller 
and Osborn in [208] to learn the fuzzy measure for a Sugeno integral can 
be classified as a reinforcement learning approach. The method uses a 
reward and punishment scheme based on the performance of the model in 
a classification task. 
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8. Other approaches: Besides the heuristic approach for parameter de- 
termination, in the case of fuzzy measures for fuzzy integrals, Proposi- 
tion 5.40, which permits us to determine a Sugeno A-measure from the 
measures on the singletons, has also been used. This is the case in the 
paper by Cho and Kim [77], which uses this approach with a Sugeno in- 
tegral. The proposition can be applied to with other methods. This is the 
case in [429], where genetic algorithms are used to determine the mea- 
sures on the singletons (for a Sugeno A-measure) in a pattern recognition 
application. The aggregation model in that paper uses a Sugeno integral. 


A 


Properties 


We list below the main properties used in this book. The properties are listed 
in alphabetical order. The list is not exhaustive. 


Associativity: 
(roy)oz-—mo(yoz) 

Internality: 

min a; < C(ai, E ,an) < max aj 

1 2 
Neutral element e: 
roe-—ma 

Positive Homogeneity: 

C(rai,...,raw) — rC(ai,..., aw) 

forr >0 

Reciprocity: 


C(1/ai,...,1/aw) = 1/C(a1,... an) 
Sensitive: For all k = 1,..., N, ak # aj, 
C(a1,..., 0&1, ak, Gk 41, -, 4N)  C(a1,..., 51, ak, Ok 41, -, QN) 


Separable: C(a1,...,aw) is separable if there exist functions g1,...,gn and 
o (continuous, associative, and cancellative) such that 


C(ai,..., an) = g1(@1) © go(a2) 0 -- -° gu (aw) 
Symmetry: For any permutation a of (1,..., N}, 
C(aj,..., aw) = C(az(1),..., a« (N)) 


Unanimity: 


C(a,...,a) =a 


This property is also known as identity, reflexivity, or agreement. 
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Comparable: Bı and Bə are said to be comparable when either B4 < B» or 
B3 < Bi. Bı < Bə if, for all x,y € [0,1], we have Bi(z, y) € Bo(ax,y) (a 
similar definition applies for operators on [0, 1]""). 


B 


Some Aggregation Operators 


For reference, we list below some of the aggregation operators studied in this 
book. The operators are listed in alphabetical order. The definitions use the 
following notation: X is a set of reference or information sources, f(x;) is 
the value supplied by x; (with a; = f(z;)), is a fuzzy measure, p and w 
are (probabilistic) weighting vectors, u is a possibilistic weighting vectors, 
c is a permutation such that ag(; > as(i41), S is a permutation such that 
Asi) S as(i+1) Ao(ey = lo (jj € k}, and Aj = {zoli = k} 


Arithmetic mean (AM): 


N 
5 ai/ N 
i=1 
Choquet integral (CI): 
N 
New) ES f(@s(i—-1) )|M(Asciy) 
i=l 


Geometric mean (GM): 





Harmonic mean (HM): 


OWA: 
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d p T; 
N 


i=l, 


Root-mean-power: 


Sugeno integral (SI): 
Twofold integral (TI): 


Pine) =S (( V fs) ^ us(Ast))) (nc (As) — po(Ase+1)))) 


il ` j=l 
Weighted maximum (WMax): 
max min(u;, aj) 
Weighted mean (WM): 
N 
DS Pili 
i=1 
Weighted minimum (WMin): 
min max(neg(u;), ai) 
WOWA: 


N 
& Uidg(i) 


with w; = w* ($ iei Po(j)) <iPo(j 
w* nondecreasing Tao POR. Ade 2 ay) Hii, U {(0,0)} 
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membership function, 50, 52, 53, 55, 62, 
115, 152, 153, 170, 174, 183 
degree of, 50 
triangular, 51 
meta-knowledge model, 195 
midrange, 15 
minimum homogeneous, 181 
Minkowski's inequality, 93 
mode, 9, 10, 15 
monotonicity, 6, 12, 112, 144 
k-monotonicity, 120 
k-order monotonicity, 120 
totally monotone, 120 
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Nagano, 177, 178 

necessity measure, 124, 143 

negation, 56 
characterization, 56 
definition, 56 
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stable, 98, 99 
neutral element, 53, 55, 84, 87, 88 
node 

in a hierarchy (HDFM), 129, 131 
nonmonotonic fuzzy measure, see fuzzy 

measure, nonmonotonic 

nonparametric method, 33 
nonparametrical method, 64 
normal distribution, 33, 34, 39, 43, 64 

multivariate, 33, 34 

univariate, 33, 34 
nullnorm, 84, 103, 104 

odf, 213 


odf, see orness distribution function 
Olympic games, 148 
order statistics, 45, 65, 98, 149, 150, 192 
Ordered Weighted Aggregation 
Operator, see OWA 
ordinally stable 
@-ordinally stable, 94 
orness, 207, 215, 220 
interpretation, 213, 216 
ordinal, 216 
quantifier, 209, 216 
orness distribution function, 212 
outer measure, 115 
outlier, 65, 149, 151, 192 
OWA, 147, 154, 156, 161—164, 169, 170, 
172, 175, 192, 194, 211, 220 
andness, 216 
BADD, see OWA, basic defuzzifica- 
tion distribution 
basic defuzzification distribution, 192 
characterization, 192 
definition, 148, 160 
dictatorship, 13 
dispersion, 216 
entropy, 206 
example, 148, 150, 154, 161 
generalizations, 192 
generalized OWA, 192, 246 
geometric OWA, 192, 246 
induced GOWA, 192 
induced OWA, 192 
linguistic OWA, 196 
maximum entropy OWA, 225, 244 
nonmonotonic, 192 
odf, 213 
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ordinal OWA, 194 
characterization, 194 
orness, 208, 209, 216, 223, 225 
parameter determination, 220, 224, 
225, 235, 237, 244, 245 

quantifier, 159 

quasi-OWA, 192 

weighted, see WOWA 

with importances, 194 
OWMax, 175, 194, 217 
OWMin, 175, 194, 217 


parameter determination, 219 
parametric distribution, 33 
parametric model, 33, 39, 41 
parametric technique, 33 
partition, 31 
Pearson's correlation coefficient, see 
correlation coefficient 
Penrose index, 202 
permissible transformations, 22-25, 64, 
94, 96, 97 
plausibility, 115, 116, 120, 128, 129, 143 
plurality rule, 9, 10, 15 
positive homogeneity, see positive, 
restricted 
possibilistic weighting vector, 172 
possibility distribution, 63, 125, 172, 
175, 176, 181, 182 
possibility measure, 124, 126, 143 
power indices, 214, 219 
Banzhaf value, see Banzhaf value 
Shapley value, see Shapley value 
Shapley-Shubik index, see Shapley- 
Shubik index 
premeasure, 142 
probabilistic weighting vector, 172 
probability distribution, 172, see 
random measure, probability 
distribution 
probability measure, 25-29, 112, 122, 
124, 126, 129, 134, 142, 143 
product moment correlation, see 
correlation coefficient 
Prospect theory, 144 
PROSPECTOR, 84, 89, 104 
pseudo-inverse, see quasi-inverse 


quantifier, 52, 61, 62, 66, 135, 159, 162, 
169, 172, 192, 194, 209, 214, 224 
a-trimmed, 210 
generating functions, 160, 192, 216 
orness, 209, 216 
regular increasing monotone fuzzy 
quantifier, 159 
regular nondecreasing fuzzy quanti- 
fier, 159 
Sugeno A-quantifier, 140, 210, 211, 
216, 224 
Yager o-quantifier, 210, 211, 224 
quasi-inverse, 54, 55 
quasi-linear means, see mean, quasi- 
weighted 


random measure 
cumulative distribution function 
(cdf), 29 
distribution functions, 29 
probability distribution, 29 
random variable, 28-34, 64 
covariance, 32, 33 
covariance matrix, 32 
cumulative distribution, 29 
density, 29 
distribution, 29 
expectation, 29, 64, 77 
independence, 31—33, 38, 41, 64 
law of, 29 
mean, see mean 
median, 41 
moment, 29, 30, 64 
absolute, 30 
central, 30 
first, 30 
second central, 30 
probability density function (pdf), 29 
probability measure induced by, 28 
variance, 30, 32-34, 39 
variance-covariance matrix, 34 
randomness, 25, 115 
reciprocity, 95, 107, 221 
rectangle 
area, 72, 73, 79 
reference set, 26 
regression, 34, 36-41, 46, 65, 246 
linear regression model, 34-37, 39, 
41, 46 


robust, 39, 46, 47, 65 
weighted regression, 39 
regressor, see variable, explanatory 
relation 
binary, 59 
crisp, 59 
fuzzy, 59 
L-T composition, 59 
max-min composition, 59 
residuals, 46 
Retail Price Index, 1 
RIM quantifier, see quantifier, regular 
increasing monotone fuzzy 
quantifier 
RMP, see mean, root-mean-power 
robust regression, see regression, robust 
robust statistics, 39—42, 45, 65, 197 
root 
in a hierarchy (HDFM), 130 
root-mean-power, see mean, root-mean- 
power 


sample space, see reference set 
SC, see sensitivity curve 
scale, 21, 23-25, 64, 93, 94 
1-9 scale, 221 
absolute, 23, 24, 64 
aggregation different units, 99 
AHP 1-9 scale, 221 
difference, 25 
grades, 2 
interval, 23-25, 64, 96, 108 
log-interval, 25, 64 
Mohs, 21-24 
nominal, 25 
numerical, 2, 16, 107, 171 
ordinal, 2, 6, 9, 16, 22-24, 52, 64, 97, 
98, 108, 150, 171, 172, 175, 179, 
216, 217 
ratio, 23-25, 64, 94, 95, 99, 100, 108, 
220—222 
sensitivity, 13, 101 
sensitivity curve, 42 
separable, 90 
set of indifference, 141 
Shapley value, 198, 206 
characterization, 200 
Shapley-Shubik index, 199 
source 


Index 283 


independence, see variable, indepen- 
dence 
SSE 
sum of squares of errors, 39 
SSR 
sum of squares due to regression, 38 
SST 
total sum of squares, 38 
standard deviation, 30 
state space, see reference set 
subidempotency, 54, 83 
Sugeno A-measure, see fuzzy measure, 
Sugeno A-measure 
Sugeno A-quantifier, see quantifier, 
Sugeno A-quantifier 
Sugeno integral, 108, 111, 147, 163, 171, 
175, 182, 185, 187-191, 194, 196, 
217 
definition, 176 
example, 177 
parameter determination, 246, 247 
superidempotency, 55, 83 
swing voter, 202 
symmetric 
matrix, 38 
symmetric fuzzy measure, see fuzzy 
measure, symmetric 
symmetric operator, 83, 84, 88, 98, 99, 
107, 149 
symmetry, 6, 83, 200, 201, 206 
synthesis of judgements, 81 


t-conorm, 58, 60, 65, 66, 73, 84, 85, 88, 
103, 126, 129, 182-186, 191, 196, 
197, 212 
algebraic sum, 55, 89 
Archimedean, 55, 56, 83, 126, 139 
characterization, 55 
comparable, 57 
continuous superidempotent, 55 
definition, 54 
Hamacher, see Hamacher family 
Lukasiewicz, 55, 126, 127 
maximum, 55, 140 
Sugeno, 55, 127 
Yager, 55, 57 
t-conorm system, 183 
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t-norm, 53-56, 58, 60, 65, 66, 73, 83-85, 
88, 103, 143, 186, 187, 191, 196, 
197, 212 
algebraic product, 54 
Archimedean, 54 
characterization, 54, 65 
comparable, 57 
continuous, 54 
definition, 53 
Lukasiewicz, 54 
minimum, 54 
Yager, 54 
'arragona, 24 
temperature, 23 
Theory of relativity 
combination of velocities, 84, 103 
Tokyo, 60, 177, 178 
transferable belief model, 143 
transformations 
permissible, see permissible transfor- 
mations 
trimmer mean, see mean, trimmed 
truth degrees, 60 
Tsukuba, 178, 179 
twofold integral, 182, 187, 190, 195 
definition, 187 
parameter determination, 246 


unanimity, 6, 91 

unconstrained fuzzy measure, see fuzzy 
measure, unconstrained 

uninorm, 84, 103, 104, 197, 212, 213, 
216, 246 

odf, 213 
union, 53 
uniqueness theorem, 22-24 


València, 24, 99, 100 

variability, 205 

variable 
dependent, see variable, response 
explanatory, 34 


independence, 235, see random 
variable, independence 
independent, see variable, explana- 
tory 
response, 34 
voting, 4, 9, 10, 17, 20, 203 


weight, 192 
determination, see Chapter 8 
weighted geometric mean, 101, 102 
weighted maximum, 108, 172, 175, 176, 
180, 194, 196, 217 
characterization, 194 
weighted mean, 245, see mean, weighted 
weighted minimum, 108, 172, 175, 180, 
194, 196, 217 
characterization, 194 
weighted OWA, see WOWA 
weighted root-mean-power, 101, 108 
weighting functions, 144, see distorted 
probabilities 
weighting vector, 148, 150, see pos- 
sibilistic weighting vector, see 
probabilistic weighting vector 
winsorized mean, see mean, winsorized 
WMax, see weighted maximum 
WMin, see weighted minimum 
WOWA, 144, 147, 154-157, 159, 160, 
163, 169, 170, 193, 194 
continuous, 194 
definition, 155, 160 
example, 162 
interpolation function, 155, 156, 158, 
160, 193 
linguistic WOWA, 196 
m-dimensional, 194 
parameter determination, 236, 
244—246 
quantifier, 159, 194 
weights, 156 


Yager a-quantifier, see quantifier, Yager 
a-quantifier 
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